Expression system for altered expression levels

ABSTRACT

A new expression system is provided which comprises component(s) of a lipase regulation cascade. The lipase regulation cascade as disclosed herein includes a kinase, a DNA binding regulator, a polymerase, a promoter, an upstream activating sequence, and secretion factors. Plasmids and transformed cells are also provided as well as methods of transforming host cells using the plasmids. Further, there is provided a kinase that can regulate the expression of a protein, a DNA binding regulator that can regulate the expression of a protein, a  Pseudomonas alcaligenes polymerase , a  Pseudomonas alcaligenes  sigma 54 promoter, a  Pseudomonas alcaligenes  upstream activating sequence, the  Pseudomonas alcaligenes  secretion factors XcpP, XcpQ, XcpR, XcpS, XcpT, XcpU, XcpV, XcpW, XcpX, XcpY, XcpZ and the xcp regulators OrfV, OrfX.

RELATED APPLICATIONS

This application is a divisional application of U.S. Ser. No. 08/911,853filed Aug. 15, 1997 now U.S. Pat. No. 6,048,710, which is acontinuation-in-part application of U.S. Ser. No. 08/699,092, filed Aug.16, 1996 abandoned, both applications being hereby incorporated byreference in their entirety.

FIELD OF THE INVENTION

The present invention relates to the discovery of the lipase regulationcascade of Pseudomonas alcaligenes. Specifically, the present inventionprovides the nucleic acid and amino acid sequences of various componentsof the lipase regulation cascade which may be used in expression methodsand systems designed for the production of heterologous proteins.

BACKGROUND OF THE INVENTION

The isolation and identification of a microorganism that can naturallysecrete a product of potential industrial production is one of, if notthe most, vital steps in the process of fermentation biotechnology. Theability to secrete the protein of interest usually leads to easierdownstream processing. The next critical stage is the mutagenesis of anaturally occurring strain to a hyper-producing strain. Over a number ofyears, scientists have developed screening strategies from which anumber of exo-protein producing bacteria have been isolated. Followingisolation, a large number of rounds of mutagenesis can be used tocontinuously select higher producing strains. However, classical strainimprovement cannot be used indefinitely to further increase productionlevels. Therefore, a more direct method of characterization andmolecular genetic manipulation is needed to achieve higher productionlevels.

Several patents and publications have claimed or described a lipasemodulator gene (WO 94/02617; EP 331,376; Nakanishi et al. (1991)Lipases-Struct. Mech. Genet. Eng. GBF Monographs 16:263-266). However,later research has shown that the product of the gene, now called lif,is concerned with folding of the lipase rather than regulating theexpression of the lipase. A review of various lipase expression systemsthat use the lif gene product can be found in Jaeger et al. (1994) FEMSMicrobiol. Rev. 15:29-63.

Another publication discusses the sigma 54 promoter and the types ofgenes that have been described to be under control of this type ofpromoter. Morrett and Segovia (1993) J. Bacter. 175:6067-6074.

The search has continued for an expression system that can efficientlyexpress a heterologous protein, particularly a lipase in Pseudomonas, inparticular Pseudomonas alcaligenes. Pseudomonas expression of lipase isvery difficult and often is at lower levels than industry would like tosee.

The present invention solves the problem of low levels of expression ofproteins in Pseudomonas as well as other microbial hosts.

SUMMARY OF THE INVENTION

The present invention relates to the discovery of a Pseudomonas lipaseregulation cascade and provides individual components of the regulationcascade that can be used in expression systems for the production andsecretion of proteins in host cells. The regulation cascade comprises,surprisingly, a two-component part that includes a kinase and a DNAbinding regulator. The two components work in concert with a promoterand an upstream binding sequence to efficiently express a protein. Theregulation cascade also comprises secretion factors that can be used inhost cells to enhance the secretion of produced proteins.

The present invention provides nucleic acid and amino acid sequences forthe various components of the Pseudomonas alcaligenes lipase regulationcascade. The present invention also provides new, efficient expressionsystems, i.e., expression vectors, and host cells that can be used toexpress proteins at increased levels. The new expression systems allowfor increased expression of a protein whose gene is functionally linkedto components of the expression system, i.e., components of the lipaseregulation cascade. A hyper-producing strain can thus be developed andused in a commercial setting.

In one embodiment of the invention, an isolated nucleic acid encoding akinase that can regulate the expression of a protein, preferably alipase, is provided. The nucleic acid encoding a kinase is preferablyderived from a Gram-negative bacteria such as a pseudomonad, preferablyfrom Pseudomonas alcaligenes and is most preferably lipQ. Further,nucleic acid encoding the kinase preferably has the sequence as shown inFIGS. 1A-1B (SEQ ID NO: 1) and/or has at least 50% homology with thatsequence. The kinase protein is also provided and it is preferablyderived from a bacteria, preferably from a Gram-negative bacteria suchas a pseudomonad, most preferably, the kinase is from Pseudomonasalcaligenes. In a preferred embodiment, the kinase is LipQ. The kinasepreferably has the sequence shown in FIGS. 1A-1B, (SEQ ID NO: 2) and/orhas at least 50% homology with that sequence.

In another embodiment, the present invention provides a nucleic acidencoding a kinase that can regulate the expression of a lipase inPseudomonas alcaligenes. In another embodiment, the present inventionprovides a kinase capable of regulating the expression of a lipase inPseudomonas alcaligenes.

In a further embodiment of the invention, an isolated nucleic acidencoding a DNA binding regulator that can regulate the expression of aprotein, preferably a lipase, is provided. The DNA binding regulatornucleic acid is preferably lipR. Further, it preferably has the sequenceas shown in FIGS. 2A-2B (SEQ ID NO: 3) and/or has at least 50% homologywith that sequence. The DNA binding regulator protein is also providedand it is preferably LipR. The DNA binding regulator preferably has thesequence shown in FIGS. 2A-2B (SEQ ID NO: 4) and/or has at least 50%homology with that sequence. Preferably, the DNA binding regulator isfrom bacteria. More preferably, the DNA binding regulator is from aGram-negative bacteria such as a pseudomonad. Most preferably, the DNAbinding regulator is from Pseudomonas alcaligenes.

In yet a further embodiment, the present invention provides an isolatednucleic acid that encodes a DNA binding regulator that can regulate theexpression of a lipase in Pseudomonas alcaligenes. In anotherembodiment, the present invention provides the DNA binding regulatoritself.

In yet another embodiment of the invention, nucleic acid encoding aportion of a polymerase that can regulate the expression of a protein,preferably a lipase, is provided. The polymerase nucleic acid ispreferable orfZ. Further, it preferably has the sequence as shown inFIGS. 9A-9B (SEQ ID NO: 36) and/or has at least 75% homology with thatsequence. A portion of the polymerase protein is also provided and it ispreferable OrfZ. The polymerase protein preferable has the sequenceshown in FIGS. 9A-9B (SEQ ID NO: 37) and/or at least 75% homology withthe sequence. Preferably, the polymerase is from Gram-negative bacteriasuch as pseudomonad. Most preferably, the polymerase is from Pseudomonasalcaligenes.

In another embodiment, the kinase, the DNA binding regulator and aportion of the polymerase are present in one nucleic acid. In anotherembodiment, the kinase, the DNA binding regulator and the polymerasehave the nucleic acid sequence shown in FIGS. 4A-4G (SEQ ID NO: 28).

In another embodiment of the invention, an isolated nucleic acidencoding a Pseudomonas alcaligenes sigma 54 promoter is provided.

In a further embodiment of the invention, an isolated nucleic acidencoding a Pseudomonas alcaligenes upstream activating sequence isprovided. The upstream activating sequence is preferably UAS. Further,it preferably has the sequence as shown in SEQ ID NO: 5 and/or has atleast 50% homology with that sequence. Preferably, the upstreamactivating sequence is from bacteria. More preferably, the upstreamactivating sequence is from a Gram-negative bacteria such as apseudomonad. Most preferably, the upstream activating sequence is fromPseudomonas alcaligenes.

In yet another embodiment of the invention, isolated nucleic acidsencoding secretion factors are provided. The secretion factors arepreferably XcpP, XcpQ, OrfV, OrfX, XcpR, XcpS, XcpT, XcpU, XcpV, XcpW,XcpX, XcpY, XcpZ and another protein, OrfY, having the C-terminal aminoacid sequence shown in SEQ ID NO: 35. Further, they preferably have thenucleic acid sequence as shown in SEQ ID NOS: 12, 14, 30, 16, 6, 8,10,18, 20, 22, 24, 26, 32 and 34, respectively, and/or have at least 90%homology with those sequence. The secretion factor proteins are alsoprovided and preferably have the amino acid sequences shown in SEQ IDNOS: 13, 15, 31, 17, 7, 9, 11, 19, 21, 23, 25, 27, 33 and 35,respectively, and/or have at least 90% homology with that sequence.Preferably, the secretion factors are from bacteria. More preferably,the secretion factors are from a Gram-negative bacteria such as apseudomonad. Most preferably, the secretion factors are from Pseudomonasalcaligenes.

In a further embodiment, the genes encoding the secretion factors XcpP,XcpQ, Orf V, OrfX, XcpR, XcpS, XcpT, XcpU, XcpV, XcpW, XcpY, XcpX andOrfY are present in one nucleic acid having the DNA sequence shown inFIGS. 3AA-3BB (SEQ ID NO: 29). Both xcp gene clusters xcpP˜Q and xcpR˜Zare oriented divergently with in between OrfV and OrfX as shown in FIG.8.

Another embodiment of the invention includes an isolated nucleic acidencoding a Pseudomonas alcaligenes lux-box binding element and orfV-boxbinding elements that can regulate expression of a protein.

Yet another embodiment provides nucleic acids that can hybridize to thenucleic acids shown in SEQ ID NOS: 1, 3, 6, 8, 10, 12, 14, 16, 18, 20,22, 24, 26, 30, 32, 34 and 36 under high stringency conditions.

In a further embodiment, there is provided an expression systemcomprising a gene encoding a protein functionally linked to nucleicacids encoding a kinase, a DNA binding regulator, a polymerase, apromoter and an upstream activating sequence. The expression system canalso include secretion factors, and their regulatory regions.Preferably, the regulating elements and the secretion factors are frombacteria. More preferably, the regulating elements and the secretionfactors are from a Gram-negative bacteria such as a pseudomonad. Mostpreferably, the regulating elements and the secretion factors are fromPseudomonas alcaligenes.

Another embodiment provides an expression system that can regulate theexpression of a lipase in Pseudomonas alcaligenes.

In another embodiment of the invention, replicating plasmids andintegrating plasmids containing the expression system or a nucleic acidencoding one or more of the secretion factors are provided.

Also provided are methods of transforming a host cell with a plasmidthat contains the expression system and/or a nucleic acid encoding oneor more secretion factors as well as transformed host cells containingthe expression system and/or a nucleic acid encoding one or moresecretion factors. A host cell is transformed by introducing the plasmidto the host cell under appropriate conditions. Preferably, the host cellis electroporated to allow the plasmid to enter the host cell.Preferably, the host cell is bacteria. More preferably, the host cell isa Gram-negative bacteria such as a pseudomonad. Most preferably, thehost cell is Pseudomonas alcaligenes.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1B show the DNA (SEQ ID NO: 1) and amino acid sequences (SEQ IDNO: 2) of LipQ from Pseudomonas alcaligenes.

FIGS. 2A-2B show the DNA (SEQ ID NO: 3) and amino acid sequences (SEQ IDNO: 4) of LipR from Pseudomonas alcaligenes.

FIGS. 3A-3B show the DNA sequence (SEQ ID NO: 29) of 17.612 bp from theinsert on cosmid #600 containing the secretion factors XcpQ, XcpP, OrfV,OrfX, XcpR, XcpS, XcpT, XcpU, XcpV, XcpW, XcpX, XcpY, XcpZ and a part ofan other protein OrfY from Pseudomonas alcaligenes. The predicted aminoacid sequences of the open reading frames (SEQ ID NO: 13, 15, 31, 17, 7,9, 11, 19, 21, 23, 25, 27, 33 and 35, respectively) are shown inone-letter code below the DNA sequence. Likewise, the terminatorsequences are shown as two bolded convergent arrows and the bindingelements for regulator, OrfV (orfV-boxes) are shown as a bolded boardedline.

FIGS. 4A-4G show the DNA sequence (SEQ ID NO: 28) of the overlapping4.377 bp fragment of cosmids #71, #201, #505, #726 that includes theopen reading frames of LipQ, Lip R and a part of OrfZ from Pseudomonasalcaligenes. The predicted amino acid sequence of the open readingframes (SEQ ID NO: 2, 4 and 37, respectively) are shown in one-lettercode below the DNA sequence. Likewise, the terminator sequence is shownas two bolded convergent arrows, the binding element for auto-inducers(lux-box) and the binding elements for OrfV (orfV-boxes) are shown as abolded boarded line.

FIG. 5 shows the effect on lipase production of cosmid #505 at 10 literscale. A threefold higher yield of lipase after fermentation wasobserved.

FIG. 6 shows production plasmid stability in production strain Ps1084and Ps1084+cosmid #600 as determined by neomycin resistance.

FIG. 7 shows the theoretical scheme for the action of LipQ, LipR, thesigma 54 promoter and the upstream activating sequence on the DNA strandencoding LipA. The small rectangle on the DNA strand below the D-domainof LipR is the upstream activating sequence (UAS).

FIG. 8 shows the orientation of the xcp-genes from Pseudomonasalcaligenes on the map of cosmid #600 as extracted from SEQ ID NO: 29.

FIGS. 9A-9B shows the DNA (SEQ ID NO: 36) and amino acid sequence (SEQID NO: 37) of OrfZ from Pseudomonas alcaligenes.

FIG. 10 shows the proposed model for the regulation cascade of thelipase from Pseudomonas alcaligenes.

DETAILED DESCRIPTION OF THE INVENTION

In order to further improve lipase expression in Pseudomonasalcaligenes, a pragmatic search for limiting factors was initiated. Acosmid library from the wild-type P.alcaligenes genome was used as adonor of DNA fragments to be introduced into a multicopy P. alcaligeneslipase production strain. In total, 485 cosmids were transformed,followed by screening of cosmids containing P.alcaligenes strains withrespect to their lipase production activity. Twenty cosmid strains wereselected, each of which showed a significant enhancement of lipaseexpression as judged from various liquid and plate tests. Thecorresponding cosmids were also tested in a single copy lipase strainand some of them were found to give a threefold increase of lipaseexpression. The four best cosmids were found to share an overlappingfragment of 5.6 kb. The lipase stimulating activity was localized on a4.5 kb fragment.

The present invention relates to the identification of a Pseudomonasalcaligenes lipase regulation cascade, which contain multiple componentsassociated with the expression of lipase. As used herein, the term“regulation cascade” relates to the entire complex of individualcomponents identified herein, such as kinase, DNA binding regulator,polymerase, UAS, lux-box, orfv-boxes, secretions factors and theirregulatory regions. Components of the regulation cascade can be usedalone or in combination with other components to modulate the expressionof proteins in host cells. In a preferred embodiment, the host cell is agram-negative host. In another embodiment, the host cell is apseudomonad. In another preferred embodiment, the host cell isPseudomonas alcaligenes.

Preferred desired proteins for expression include enzymes such asesterases; hydrolases including proteases, cellulases, amylases,carbohydrases, and lipases; isomerases such as racemases, epimerases,tautomerases, or mutases; transferases, kinases and phophatases. Theproteins may be therapeutically significant, such as growth factors,cytokines, ligands, receptors and inhibitors, as well as vaccines andantibodies. The proteins may be commercially important, such asproteases, carbohydrases such as amylases and glucoamylases, cellulases,oxidases and lipases. The gene encoding the protein of interest may be anaturally occurring gene, a mutated gene or a synthetic gene.

The 4.5 kb fragment was sequenced and found to encode the LipQ, LipR andpolymerase proteins (FIGS. 4A-4G). While not intending to be bound bytheory, it is believed that these proteins are involved in theregulation of the sigma 54 promoter in front of the lipase (LipA) andlipase modulator (LipB) gene region (see FIG. 7). These sigma 54promoters characteristically have an upstream enhancer region, hereinthe upstream activating sequence or UAS, which is regulated by proteins.Regulation can be achieved by either a two-component system, such asNtrB-NtrC, or by a one-component system, for example NifA, in which theprotein is in close association with the substrate (reviewed by Morettand Segovia, supra).

According to the present invention, expression of a protein can beregulated when a kinase and a DNA binding regulator, which are providedin trans, interact with a promoter and/or an upstream activatingsequence which are functionally linked to a gene encoding the protein ofinterest. Preferably, the expression of the protein is increased.

A “kinase” is an enzyme that can catalyze the transfer of phosphate toeither itself or another protein. The kinase of the present invention ispreferably LipQ, a kinase that can regulate the expression of a lipase.A LipQ has been isolated from Pseudomonas alcaligenes. As such, thekinase preferably is encoded by a nucleic acid having the DNA sequenceshown in FIGS. 1A-1B (SEQ ID NO: 1) and has the amino acid sequenceshown in FIGS. 1A-1B (SEQ ID NO: 2). A kinase can act alone or as partof an expression system to regulate the expression of the protein. Insome cases, the absence of this kinase will cause the expression of theprotein to be decreased or eliminated.

A “DNA binding regulator” is a proteinaceous substance which physicallyinteracts with DNA and, in doing so, influences the expression of genesclose to the binding position. The DNA binding regulator is preferablyLipR, a DNA binding regulator that can regulate the expression of alipase. A LipR has been isolated from Pseudomonas alcaligenes. As such,the DNA binding regulator preferably is encoded by a nucleic acid havingthe DNA sequence shown in FIGS. 2A-2B (SEQ ID NO: 3) and has the aminoacid sequence shown in FIGS. 2A-2B (SEQ ID NO: 4). A DNA bindingregulator can act alone or as part of an expression system to regulatethe expression of the protein. A DNA binding regulator of the presentinvention can be used alone or in combination with a kinase. The presentinvention encompasses variants of the DNA binding regulators disclosedherein that are capable of autophosphorylation. Such variants can leadto a constitutively higher expression of the target protein. In somecases, the absence of this DNA binding regulator will cause theexpression of the protein to be decreased or eliminated.

As used herein “polymerase” refers to an enzyme that elongates DNA orRNA to obtain larger strands of either DNA or RNA, respectively. It isone of the most crucial factors in the production of proteins, such aslipase. In a preferred embodiment, the polymerase is OrfZ. Thus, in apreferred embodiment, the polymerase preferably is encoded by a nucleicacid having the DNA sequence shown in FIGS. 9A-9B (SEQ ID NO: 36) andhas the amino acid sequence shown in FIGS. 9A-9B (SEQ ID NO: 37). Thepolymerase may play a role in modifying the expression of the desiredprotein.

Promoters are DNA elements that can promote the expression of a protein.A “sigma 54 promoter” is a bacterial promoter and is a member of a classof sigma factors with a size of approximately 54 Kda. These sigmafactors are also known as RpoN proteins. Sigma 54 promoters and theirfunctions are discussed in Morrett and Segovia (1993) J. Bacter.175:6067-6074. Preferably, the promoter is a Pseudomonas alcaligenessigma 54 promoter. Most preferably, the sigma 54 promoter is the lipasepromoter of P. alcaligenes (SEQ ID NO: 5) (WO 94/02617). According tothe present invention, the sigma 54 promoter has an upstream activatingsequence.

An “upstream activating sequence” is a binding position for apositively-acting DNA binding regulator. As indicated by its name, theupstream activating sequence is upstream of the transcription start siteand is a nucleic acid. The upstream activating sequence is preferablyUAS, an upstream activating sequence that can regulate the expression ofa lipase, and is preferably derived from Pseudomonas alcaligenes. Anupstream activating sequence can act alone or as part of an expressionsystem to regulate the expression of the protein. In some cases, theabsence of this upstream activating sequence will cause the expressionof the protein to be decreased or eliminated. Preferably, the upstreamactivating sequence is the consensus: TGT(N)₁₁ACA. In the Pseudomonasalcaligenes lipase gene sequence, one specific region around -200 bpfrom the ATG start codon fits this consensus: TGTtcccctcggtaACA (SEQ IDNO: 5) (WO 94/02617).

A secretion factor is a protein that aids in secreting another proteinfrom a cell. Preferably, the secretion factor is a member of the Xcpprotein family and acts in concert with other members of the Xcp proteinfamily. A genomic fragment encoding genes xcpQ, xcpP, orfV, orfX, xcpR,xcpS, xcpT, xcpU, xcpV, xcpW, xcpX, xcpY, xcpZ and the C-terminal partof protein OrfY has been isolated from Pseudomonas alcaligenes. As such,the secretion factors preferably are encoded by a nucleic acid havingthe DNA sequence shown in FIGS. 3AA-3BB (SEQ ID NO: 29). Specificallyand more preferably, the XcpP secretion factor is encoded by the DNAsequence shown in SEQ ID NO: 12 and has the amino acid sequence shown inSEQ ID NO: 13; the XcpQ secretion factor is encoded by the DNA sequenceshown in SEQ ID NO: 14 and has the amino acid sequence shown in SEQ IDNO: 15; the OrfV protein is encoded by the DNA sequence shown in SEQ IDNO: 30 and has the amino acid sequence shown in SEQ ID NO: 31; the OrfXprotein is encoded by the DNA sequence shown in SEQ ID NO: 16 and hasthe amino acid sequence shown in SEQ ID NO: 17; the XcpR secretionfactor is encoded by the DNA sequence shown in SEQ ID NO: 6 and has theamino acid sequence shown in SEQ ID NO: 7; the XcpS secretion factor isencoded by the DNA sequence shown in SEQ ID NO:8 and has the amino acidsequence shown in SEQ ID NO: 9; the XcpT secretion factor is encoded bythe DNA sequence shown in SEQ ID NO: 10 and has the amino acid sequenceshown in SEQ ID NO: 11; the XcpU secretion factor is encoded by the DNAsequence shown in SEQ ID NO: 18 and has the amino acid sequence shown inSEQ ID NO: 19; the XcpV secretion factor is encoded by the DNA sequenceshown in SEQ ID NO: 20 and has the amino acid sequence shown in SEQ IDNO: 21; the XcpW secretion factor is encoded by the DNA sequence shownin SEQ ID NO: 22 and has the amino acid sequence shown in SEQ ID NO: 23;the XcpX secretion factor is encoded by the DNA sequence shown in SEQ IDNO:24 and has the amino acid sequence SEQ ID NO: 25; the secretionfactor XcpY is encoded by the DNA sequence shown in SEQ ID NO: 26 andhas the amino acid sequence shown in SEQ ID NO: 27; the secretion factorXcpZ is encoded by the DNA sequence shown in SEQ ID NO: 32 and has theamino acid sequence shown in SEQ ID NO: 33; a part of protein OrfY isencoded by the DNA sequence shown in SEQ ID NO: 34 and has the aminoacid sequence shown in SEQ ID NO: 35.

Upstream of the lipQ gene, a promoter region has been identified. Withinthis promoter region, a lux-box can be recognized, see SEQ ID NO: 28.This lux-box shows significant homology to the binding site for luxRtype regulator elements, which are known to be under control ofautoinducer (Latifi et al. (1995) Molec. Microb. 17(2):333-323). Thislux-box probably represents a linkage between the autoinducer system,LipR and lipase regulation. As such, another embodiment of the inventionincludes a nucleic acid encoding a lux-box element.

Upstream of the xcpP˜Q, xcpR˜Z gene clusters, the orfX, the orfV genes(SEQ ID NO: 29) and upstream of the orfZ gene (SEQ ID NO: 28) regulatoryregions are present. A box can be recognized in the promoter regionhaving the consensus sequence ANAANAANAANAA. These boxes are referred toas orfV-binding elements, because OrfV shows homology with thewell-known Escherichia coli regulator MalT. Based upon OrfV homologywith the known regulator MalT, OrfV may be a regulator. These orfV-boxescan control the expression of the Xcp-proteins, OrfX as well as OrfVitself. Similarly, the expression of the polymerase OrfZ may becontrolled by the orfV-boxes, as shown in FIG. 10. As such, in an otherembodiment, the invention provides a nucleic acid encoding an orfV-boxelement.

Commonly, when describing proteins and the genes that encode them, theterm for the gene is not capitalized and is in italics, i.e., lipQ. Theterm for the protein is generally in normal letters and the first letteris capitalized, i.e., LipQ.

The kinase, DNA binding regulator, promoter and upstream activatingsequence will sometimes be referred to as “the regulating elements” forease of discussion. The preferred regulating elements are LipQ, LipR,the Pseudomonas alcaligenes polymerase, the Pseudomonas alcaligenessigma 54 promoter and Pseudomonas alcaligenes UAS, and can regulate theexpression of a lipase in Pseudomonas alcaligenes as defined herein. Thekinase, the DNA binding regulator and polymerase are proteins, and thepromoter and the upstream activating sequence are nucleic acids. Intransformed cells, DNA encoding the kinase and DNA binding regulatorwere multiplied using a plasmid which led in turn to a higher productionof the kinase and DNA binding regulator. The increased production of thekinase and DNA binding regulator resulted in higher transcription fromthe sigma 54 promoter which provides higher expression of the protein ofinterest.

The kinase and DNA binding regulator of the present invention representa two-component regulatory system. Preferably, the two components areLipQ and LipR and can regulate the expression of a lipase in Pseudomonasalcaligenes as defined herein. Although other two-component regulatorysystems are known, a low degree of homology exists between individualpieces of those systems and the amino acid sequence shown in SEQ ID NOS:2 and 4.

Embodiments of the invention include a kinase or a DNA binding regulatorencoded by a nucleic acid having at least 50% homology with the DNAsequences shown in SEQ ID NOS: 1 or 3, respectively. Preferably, thehomology is at least 70%, more preferably at least 90% and mostpreferably at least 95%.

Also provided are embodiments in which a secretion factor encoded by anucleic acid having at least 90% homology with the DNA sequence shown inSEQ ID NOS: 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 30, 32, 34.Preferably, the homology is at least 95%, more preferably at least 98%.Homology can be determined by lining up the claimed amino acid or DNAsequence with another sequence and determining how many of the aminoacids or nucleotides match up as a percentage of the total. Homology canalso be determined using one of the sequence analysis software programsthat are commercially available, for example, the TFastA Data SearchingProgram available in the Sequence Analysis Software Package Version 6.0(Genetic Computer Group, University of Wisconsin Biotechnology Center,Madison, Wis. 53705).

One can screen for homologous sequences using hybridization as describedherein or using PCR with degenerate primers. Chen and Suttle (1995)Biotechniques 18(4):609-610, 612.

Also, in several embodiments of the invention, there are providednucleic acids that can hybridize with the DNA or fragments thereof,shown in FIGS. 1A-1B, 2A-2B, 3AA-3BB and 9, SEQ ID NOS: 1, 3, 6, 8, 10,12, 14, 16, 18, 20, 22, 24, 26, 30, 32, 34, 36, respectively, understringent conditions. Stringent hybridization conditions includestringent hybridization and washing conditions as is known to one ofordinary skill in the art. Hybridization and appropriate stringentconditions are described in Sambrook et al. 1989 Molecular Cloning 2ded., Cold Spring Harbor Laboratory Press, New York.

“Bacteria” include microorganisms of the class Schizomycetes. Bacteriacan be either Gram-negative or Gram-positive. Gram-negative bacteriainclude members of the genera Escherichia, Hemophilus, Klebsiella.Proteus, Pseudomonas, Salmonella, Shigella, Vibrio, Acinetobacter, andSerratia. Gram-positive bacteria include members of the genera Bacillus,Clostridium, Staphylococcus, Streptomyces, Lactobacillus andLactococcus.

Gram-negative bacteria can be pseudomonads which are strains that aremembers of the genus Pseudomonas. Examples include Pseudomonasaeruginosa, Pseudomonas cepacia, Pseudomonas glumae, Pseudomonasstutzeri, Pseudomonas fragi, Pseudomonas alcaligenes and Pseudomonasmendocina. A preferred pseudomonad is Pseudomonas alcaligenes.Pseudomonas alcaligenes is also sometimes referred to as Pseudomonaspseudoalcaligenes.

Lipases within the scope of the present invention include those encodedby LipA, which is generally found in close association with a modulatinggene known as LipB, LipH, LipX or Lif. Lif from Pseudomonas alcaligenesis the subject of patent application WO 93/02617 as discussed above.LipA genes can be found in a variety of species of bacteria such asPseudomonas aeruginosa, Pseudomonas stutzeri, Pseudomonas alcaligenes,Pseudomonas cepacia, Pseudomonas glumae, Pseudomonas fragi, Pseudomonasmendocina, Acinetobacter calcaoceticus and Serratia marcescans.

Another embodiment of the invention provides an expression system thatcan regulate the expression of a protein, preferably a lipase. Theexpression system includes a kinase, a DNA binding regulator, apolymerase, a sigma 54 promoter and an upstream activating sequence. Theexpression system can also include secretion factors.

An expression system includes one or more proteins and/or nucleic acidswhich, when acting together, can increase the expression of a protein ina host cell. The expression system can be encoded on one or moreplasmids and may or may not be on the same plasmid as the gene encodingthe protein of interest.

The phrase “functionally linked” or “functionally coupled” means thatthe regulating elements (DNA or protein) interact physically in order toexert their function. This can be a protein/protein, DNA/protein or aDNA/DNA interaction. For example, the DNA binding regulator interactswith the promoter but genes encoding them may be at different sites onthe chromosome. As such, the genes encoding the elements can be ondifferent plasmids from each other and from the gene encoding theprotein of interest and still work together to regulate expression ofthe protein.

A plasmid is a nucleic acid molecule which is smaller than thechromosome and can replicate independently of the mechanisms used forchromosomal replication. Typically, a plasmid is a circular DNAmolecule. Plasmids can be inserted into host cells where they canreplicate and make more copies of the plasmid; hence, replicatingplasmid. Some plasmids, called integrating plasmids, can insert theplasmid DNA into the chromosome of the host cell. The plasmid DNA isthus integrated into the chromosome of the host cell. When this happens,the plasmid no longer replicates autonomously but instead replicates insynchrony with the chromosome into which it has been inserted. Thus,whereas a nonintegrated plasmid may be present at several dozen copiesper chromosome and replicate independently of the chromosome, theintegrated plasmid is present at one copy per chromosome and canreplicate only when the chromosome does so.

One embodiment of the invention is directed to a method of transforminga host cell with a plasmid that includes the nucleic acid encoding theexpression system. A host cell is a cell into which a plasmid of thepresent invention can be inserted through, for example, transformation.The host cell is preferably a bacteria. In one embodiment, the host cellis preferably a Gram-negative bacteria. In another preferred embodiment,the host cell is a pseudomonad. Preferably, the host cell is Pseudomonasalcaligenes and the regulating elements of the expression system arefrom Pseudomonas alcaligenes. The same host cell can be transformed witha further plasmid that includes a nucleic acid that encodes one or moresecretion factors. Preferably, the secretion factors are fromPseudomonas alcaligenes.

A transformed host cell is a host cell into which one or more plasmidshave been inserted. Transformation can take place by first making thehost cell competent to receive the plasmid. The naked DNA is then addeddirectly to the cells and some of the cells take it up and replicate orintegrate it. One way of making the cells competent to receive theplasmid is by electroporation as described in the Examples below.Another method that is useful for construction and transferring ofcosmid libraries is triparental mating. Kelly-Wintenberg and Montie(1989) J. Bacteriol. 171(11):6357-62.

Lipases produced according to the present invention can be used in anumber of applications. Lipases can be used in detergents and othercleaning formulations as well as a number of industrial processes.

EXPERIMENTAL Materials and Methods

Bacterial Strains

All bacterial strains were propagated with 2xTY as a liquid or solidmedium, unless otherwise stated, and are listed in Table 1. For P.alcaligenes strains, the medium was supplemented with the appropriateantibiotics: neomycin (10 mg/l), tetracycline (5 mg/l) andchloramphenicol (3 mg/l); and for transformed Escherichia coli,ampicillin was added at 100 mg/l. For cosmid containing Escherichia colistrains, the medium was supplemented with tetracycline (10 mg/l). P.alcaligenes and E. coli were grown at 37° C., aerobically.

TABLE 1 Bacterial strains used. Tet^(R), tetracycline resistant;Neo^(R), neomycin resistant; Cap^(R), chloramphenicol resistant; lip,lipase. Relevant Strain Characteristics P. alcaligenes: Ps #1 Cosmid #1in Ps 824, Tet^(R), lip⁻ Ps #26 Cosmid #26 in Ps 824, Tet^(R), lip⁻ Ps#27 Cosmid #27 in Ps 824, Tet^(R), lip⁻ Ps #57 Cosmid #57 in Ps 824,Tet^(R), lip⁻ Ps #71 Cosmid #71 in Ps 824, Tet^(R), lip⁻ Ps #91 Cosmid#91 in Ps 824, Tet^(R), lip⁻ Ps #131 Cosmid #131 in Ps 824, Tet^(R),lip⁻ Ps #201 Cosmid #201 in Ps 824, Tet^(R), lip⁻ Ps #344 Cosmid #344 inPs 824, Tet^(R), lip⁻ Ps #371 Cosmid #371 in Ps 824, Tet^(R), lip⁻ Ps#399 Cosmid #399 in Ps 824, Tet^(R), lip⁻ PS #401 Cosmid #401 in Ps 824,Tet^(R), lip⁻ Ps #404 Cosmid #404 in Ps 824, Tet^(R), lip⁻ Ps #490Cosmid #490 in Ps 824, Tet^(R), lip⁻ Ps #505 Cosmid #505 in Ps 824,Tet^(R), lip⁻ Ps #540 Cosmid #540 in Ps 824, Tet^(R), lip⁻ PS #597Cosmid #597 in Ps 824, Tet^(R), lip⁻ Ps #600 Cosmid #600 in Ps 824,Tet^(R), lip⁻ Ps #638 Cosmid #638 in Ps 824, Tet^(R), lip⁻ Ps #726Cosmid #726 in Ps 824, Tet^(R), lip⁻ Lip34 Neo^(R), lip⁺ Ps537 lip⁺(cured from production plasmid p24lipo1) Ps824 lip⁻(Lip34 cured fromproduction plasmid p24lipo1) Ps 1084 2 copies lipQ-R, lip⁺, Neo^(R),Cap^(R) Ps93 res⁻, mod⁺ Ps1108 Ps93 containing inactivation of LipR inchromosome E. coli K12: K802 hsdR⁺, hsdM⁺, gal⁻, mel⁻, supE Δ(lac-proAB), galE, StrA/Z′, lacl^(q), zΔm15, proA⁺B⁺

TABLE 2 Plasmids used Plasmid Relevant Characteristics Reference pLAFR3Cosmid vector derived from Staskawics et al. 1987 pLAFR1, Tet^(R)p24Lipo1 lip⁺, neoR equivalent to p24A2δ (see WO94/02617) pUC19 lacZ′,rop⁻ Yanisch-Perron et al. 1985

Extraction of Extra-Chromosomal DNA

Cosmid and plasmid isolations were performed using the QIAprep SpinPlasmid kit, for 1 ml overnight culture, and the QIAfilter Plasmid MidiKit, for 100 ml culture isolations (both Qiagen), according to themanufacturers instructions. For Pseudomonas strains, lysozyme (10 μl/ml)was added to the resuspension mix and incubated for 5 minutes at 37° C.to aid cell lysis. Cosmid DNA was eluted from the QIAprep columns with70° C. milliQ water, as recommended by the manufacturer. For cosmidisolations from 100 ml cultures, strains were grown overnight in LuriaBertani (LB) broth and the elution buffer was heated to 50° C.

Transformation of Pseudomonas alcaligenes

An overnight culture of P. alcaligenes was diluted 1:100 in fresh 2xTYmedium (with 10 mg/l neomycin) and the culture incubated at 37° C., inan orbital shaker, until it had reached an OD₅₅₀ of 0.6-0.8. Followingcentrifugation (10 minutes at 4000 rpm), the bacterial pellet was washedtwice with a half volume SPM medium (276 mM sucrose; 7 mM NaHPO₄(pH7.4); 1 mM MgCl₂). The cells were then resuspended in a 1/100 volume SPMmedium. Cosmid DNA and 40 μl cells were mixed together and transferredto a 2 mm gap electroporation cuvette (BTX). The cells wereelectroporated with 1.4 kV, 25 μF, 200Ω, in the Gene Pulser. Theelectroporation cuvette was washed out with 1 ml 2xTY medium and thecell mixture transferred to a clean 1.5 ml eppendorf. The transformationmixture was then incubated for 45 minutes at 37° C. After incubation,100 μl was plated onto 2xTY agar supplemented with tetracycline (5 mg/l)or neomycin (10 mg/l) or both (depending on which P. alcaligenes strainis used for electroporation). The transformation of P. alcaligenes cellswas carried out at room temperature.

Transformation of Escherichia coli

Transformation of E. coli Wk6 cells were performed usingelectroporation. Transfer of the cosmids to E. coli K802 cells wasperformed by infection according to the suppliers instructions (PromegaCorporation).

Example 1 Construction of a Cosmid Library From Pseudomonas alcaligenesDNA in E. coli

Chromosomal DNA extracted from P. alcaligenes was fractionated andligated into cosmid pLAFR3 as described in the Materials and Methodssection, above. After ligation, the mixture was transferred into E. colias described. Tetracycline resistant colonies were isolated and cosmidDNA was prepared from each of them.

Example 2 Transformation of a P. alcaligenes Cosmid Library into P.alcaligenes Overexpressing Lipase

In total, 531 plasmid DNA preparations were isolated from E. coli growncosmids. With the aid of electroporation (see Methods, above) these weretransformed into strain Lip34, a P. alcaligenes strain harboring plasmidp24Lipo1 expressing lipase, resulting in 485 cosmid containing P.alcaligenes strains. For transformation, methods as described were used.

Example 3 Selection of Cosmids Stimulating Lipase Expression

In total, 485 cosmids were transformed, followed by screening ofcosmid-containing P. alcaligenes strains with respect to their lipaseproduction activity. Twenty cosmid strains were selected which showed asignificant enhancement of lipase expression as judged from variousliquid and plate tests (see Table 3). The corresponding cosmids we realso tested in a single copy lipase strain and some of them were foundto give a threefold increase in lipase expression. The four best cosmidswere found to share an overlapping fragment of 5.6 kb. The lipasestimulating activity was localized on a 4.5 kb fragment of cosmid #71,#201, #505, #726. Sequence analysis of this fragment revealed two openreading frames which showed homology with two component regulatorysystems. (see FIGS. 4AA-G). We have named the genes lipQ, lipR and orfZ.It should be noted that from the four described cosmid-strains, onlystrains containing cosmids #71, 505 and 726, which has the completedOrfZ , give the highest lipase stimulation in the lactate test (secondcolumn in table 3) in comparison to the strain containing cosmid #201.

TABLE 3 Medium 380 + Soy Cosmid # Oil 380 + Lactate 2xTY + hexadecane 135.25 19.00 13.00 26 35.25 14.75 9.00 27 26.50 18.25 10.00 57 35.75 9.257.50 71 40.25 27.25 16.67 91 22.75 23.00 18.00 131 41.30 11.00 3.00 20139.00 18.00 10.00 344 32.50 11.00 8.30 371 25.50 13.75 15.00 399 23.0027.00 9.00 401 26.25 11.75 3.00 404 23.75 21.00 7.00 490 27.00 13.2516.00 505 63.50 28.75 15.00 540 50.50 17.75 4.25 597 47.00 25.25 25.25600 32.00 17.00 19.00 638 34.75 8.25 11.00 726 36.75 25.25 21.00 control20.80 11.50 11.50

Example 4 Evidence for Involvement of LiPQ/LipR in Lipase Expression

In order to assess the role of the lipQ/lipR operon, an insertionalinactivation of the LipR ORF was constructed in the chromosome of strainPS93. The resulting mutant, Ps1108 showed a significantly reduced haloon tributyrin agar plates as compared to PS93.

In a second experiment, the lipase expression plasmid, p24lipo1 wasintroduced into strain Ps1108. The lipase expression was severelyimpaired as compared to PS93 harboring p24lipo1.

This observation suggests the lipQ/lipR operon as the lipase regulatoryproteins.

Example 5 Construction and Characterization of a LipQ/LipROverexpressing P. alcaligenes Strain

The 4.5 kb EcoRI-HindIII fragment of one of the four lipase stimulatingcosmids (#201) was subcloned onto pLAFR3 and inserted into a P.alcaligenes strain with a single lipase gene on the chromosome (Ps537).A threefold higher yield of lipase after a 10 liter fermentation wasobserved. (See FIG. 5.)

Subsequently, the 4.5 EcoRI-HindIII fragment was inserted onto thelipase expression plasmid p24lipo1. A higher lipase expression wasobserved as could be concluded from halo size on tributyrin plates.During growth in a shake flask, plasmid instability was observed. Inorder to overcome this instability, the fragment was also integratedinto the chromosome resulting in a strain with 2 lipQ/lipR gene copiesinto the chromosome (strain Ps1084). Insertion of the lipase expressionplasmid p24Lipo1 in this strain resulted in higher lipase expression onthe plate, but a plasmid instability during fermentation.

Example 6 Effect of Cosmid #600 on Production Plasmid Stability inPs1084

Previously, a P. alcaligenes strain had been developed in which a secondcopy of lipQ-R had been integrated into the chromosome. When a lipaseproduction plasmid (plasmid p24Lipo1) was introduced at high copy number(20) into Ps1084 and the strain fermented (10 liters), plasmidinstability was observed. A shake-flask experiment was developed tomodel the situation in the fermenter. To monitor production plasmidstability and cosmid stability of transformed Ps1084, a week longshake-flask experiment was set up. After overnight growth in 10 ml 2xTYbroth (supplemented with the required amount of neomycin andtetracycline), 1 ml of transformed culture was used to inoculate 100 mlfermentation medium 380 plus 200 μl soy oil, in shake-flasks. Theinoculated shake flasks were incubated for 24 hours at 37° C. in anorbital shaker. One ml of 24 hour old culture was then used to inoculatesuccessive shake-flasks. Throughout the duration of the experiment,daily samples were taken. The presence of a neomycin marker on thelipase production plasmid was used to monitor plasmid stability. Theintegrated lipQ-R strain with the high copy lipase production plasmid(Ps1084) was transformed with cosmid #600 to see whether plasmidstability was improved.

FIG. 6 is a graphical representation of production plasmid stability inthe transformed and untransformed Ps1084 (in duplicate). After 3-4 days,plasmid instability was detected in Ps1084, observed as the 80% drop inneomycin resistant colonies. Through out the week long experiment,cosmid #600 transformed Psi 084 maintained a high degree of neomycinresistance, suggesting that cosmid #600 stabilized the productionplasmid.

Example 7 Characterization of Cosmid #600

Cosmid #600, gave a positive signal when PCR was carried out using xcpRprimers based on peptides from xcpR derived from Pseudomonas aeruginosa.The DNA sequence from cosmid #600 was digested with EcoRV and theresulting fragment mixture and purified fragments were ligated withSmaI-digested-pUC19 (Appligene) using the Rapid DNA Ligation kit(Boehringer Mannheim). E. coli cells were then electroporated.Transformants were selected on 2xTY plates containing ampicillin (100mg/l), X-Gal (Boehringer Mannheim; 40 mg/l) and IPTG (Gibco BRL; 1 mM).Transformants containing the recombinant plasmid were identified aswhite colonies and single colonies were streaked on to fresh 2xTY agarplates (with ampicillin) for purity.

Sequencing of PCR products, cosmid #600 DNA and subclones of cosmid #600(see above) was achieved by the Dye deoxy termination method, using theABI PRISM™ Dye Termination Cycle Sequencing Ready Reaction kit withAmpliTaq® DNA Polymerase, FS (Perkin Elmer) in conjunction with theApplied Biosystems 373A sequencer.

Sequencing of cosmid #600 was initiated with the primers used in the PCRto detect xcpR. In accordance with the restriction map of cosmid #600(FIG. 8), an EcoRV restriction site was identified in the nucleic acidsequence of the PCR product. Sequence analysis revealed that the 609 bpamplification product could be translated to a putative amino acidsequence with 89% homology with P. aeruginosa and 73% with P. putidaXcpR protein (amino acid residues 59-262), verifying that the xcpR genehad been identified by PCR.

FIG. 8 show the map of cosmid #600. By doing a PCR reaction withdigested DNA, we were able to deduce the location of xcpR on the insert.The position of the xcpR gene suggests that the complete Xcp operon ispresent in cosmid #600.

To date 17.612 nucleotides, encompassing xcpP, xcpQ, orfV, orfX, xcpR,xcpS, xcpT, xcpU, xcpV, xcpW, xcpX, xcpY, xcpZ and part of protein OrfYhave been sequenced (FIGS. 3AA-3BB, SEQ ID NO: 29).

While the invention has been described in connection with specificembodiments thereof, it will be understood that it is capable of furthermodifications and this application is intended to cover any variationsor adaptations of the invention following, in general, the principles ofthe invention and including such departures from the present disclosureas come within known or customary practice within the art to which theinvention pertains and as may be applied to the essential featureshereinbefore set forth, and as follows in the scope of the appendedclaims.

All patents and applications discussed in the specification areincorporated herein by reference.

37 1029 base pairs nucleic acid single linear unknown 1 ATGGGCGTATGTTCGCTGGC CAAGGACCAG GAAGTGCTGA TGTGGAACCG CGCCATGGAG 60 GAACTCACCGGCATCAGCGC GCAGCAGGTG GTCGGCTCGC GCCTGCTCAG CCTGGAGCAC 120 CCCTGGCGCGAGCTGCTGCA GGACTTCATC GCCCAGGACG AGGAGCACCT GCACAAGCAG 180 CACCTGCAACTGGACGGCGA GGTGCGCTGG CTCAACCTGC ACAAGGCGGC CATCGACGAA 240 CCGCTGGCGCCGGGCAACAG CGGCCTGGTG CTGCTGGTCG AGGACGTCAC CGAGACCCGC 300 GTGCTGGAAGACCAGCTGGT GCACTCCGAG CGTCTGGCCA GCATCGGCCG CCTGGCCGCC 360 GGGGTGGCCCACGAGATCGG CAATCCGGTC ACCGGCATCG CCTGCCTGGC GCAGAACCTG 420 CGCGAGGAGCGCGAGGGCGA CGAGGAGCTC GGCGAGATCA GCAACCAGAT CCTCGACCAG 480 ACCAAGCGCATCTCGCGCAT CGTCCAGTCG CTGATGAACT TCGCCCACGC CGGCCAGCAG 540 CAGCGCGCCGAATACCCGGT GAGCCTGGCC GAAGTGGCGC AGGACGCCAT CGGCCTGCTG 600 TCGCTGAACCGCCATGGCAC CGAAGTGCAG TTCTACAACC TGTGCGATCC CGAGCACCTG 660 GCCAAGGGCGACCCGCAGCG CCTGGCCCAG GTGCTGATCA ACCTGCTGTC CAACGCCCGC 720 GATGCCTCGCCGGCCGGCGG TGCCATCCGC GTGCGTAGCG AGGCCGAGGA GCAGAGCGTG 780 GTGCTGATCGTCGAGGACGA GGGCACGGGC ATTCCGCAGG CGATCATGGA CCGCCTGTTC 840 GAACCCTTCTTCACCACCAA GGACCCCGGC AAGGGCACCG GTTTGGGGCT CGCGCTGGTC 900 TATTCGATCGTGGAAGAGCA TTATGGGCAG ATCACCATCG ACAGCCCGGC CGATCCCGAG 960 CACCAGCGCGGAACCCGTTT CCGCGTGACC CTGCCGCGCT ATGTCGAAGC GACGTCCACA 1020 GCGACCTGA1029 342 amino acids amino acid single linear unknown 2 Met Gly Val CysSer Leu Ala Lys Asp Gln Glu Val Leu Met Trp Asn 1 5 10 15 Arg Ala MetGlu Glu Leu Thr Gly Ile Ser Ala Gln Gln Val Val Gly 20 25 30 Ser Arg LeuLeu Ser Leu Glu His Pro Trp Arg Glu Leu Leu Gln Asp 35 40 45 Phe Ile AlaGln Asp Glu Glu His Leu His Lys Gln His Leu Gln Leu 50 55 60 Asp Gly GluVal Arg Trp Leu Asn Leu His Lys Ala Ala Ile Asp Glu 65 70 75 80 Pro LeuAla Pro Gly Asn Ser Gly Leu Val Leu Leu Val Glu Asp Val 85 90 95 Thr GluThr Arg Val Leu Glu Asp Gln Leu Val His Ser Glu Arg Leu 100 105 110 AlaSer Ile Gly Arg Leu Ala Ala Gly Val Ala His Glu Ile Gly Asn 115 120 125Pro Val Thr Gly Ile Ala Cys Leu Ala Gln Asn Leu Arg Glu Glu Arg 130 135140 Glu Gly Asp Glu Glu Leu Gly Glu Ile Ser Asn Gln Ile Leu Asp Gln 145150 155 160 Thr Lys Arg Ile Ser Arg Ile Val Gln Ser Leu Met Asn Phe AlaHis 165 170 175 Ala Gly Gln Gln Gln Arg Ala Glu Tyr Pro Val Ser Leu AlaGlu Val 180 185 190 Ala Gln Asp Ala Ile Gly Leu Leu Ser Leu Asn Arg HisGly Thr Glu 195 200 205 Val Gln Phe Tyr Asn Leu Cys Asp Pro Glu His LeuAla Lys Gly Asp 210 215 220 Pro Gln Arg Leu Ala Gln Val Leu Ile Asn LeuLeu Ser Asn Ala Arg 225 230 235 240 Asp Ala Ser Pro Ala Gly Gly Ala IleArg Val Arg Ser Glu Ala Glu 245 250 255 Glu Gln Ser Val Val Leu Ile ValGlu Asp Glu Gly Thr Gly Ile Pro 260 265 270 Gln Ala Ile Met Asp Arg LeuPhe Glu Pro Phe Phe Thr Thr Lys Asp 275 280 285 Pro Gly Lys Gly Thr GlyLeu Gly Leu Ala Leu Val Tyr Ser Ile Val 290 295 300 Glu Glu His Tyr GlyGln Ile Thr Ile Asp Ser Pro Ala Asp Pro Glu 305 310 315 320 His Gln ArgGly Thr Arg Phe Arg Val Thr Leu Pro Arg Tyr Val Glu 325 330 335 Ala ThrSer Thr Ala Thr 340 1416 base pairs nucleic acid single linear unknown 3ATGCCGCATA TCCTCATCGT CGAAGACGAA ACCATCATCC GCTCCGCCCT GCGCCGCCTG 60CTGGAACGCA ACCAGTACCA GGTCAGCGAG GCCGGTTCGG TTCAGGAGGC CCAGGAGCGC 120TACAGCATTC CGACCTTCGA CCTGGTGGTC AGCGACCTGC GCCTGCCCGG CGCCCCCGGC 180ACCGAGCTGA TCAAGCTGGC CGACGGCACC CCGGTACTGA TCATGACCAG CTATGCCAGC 240CTGCGCTCGG CGGTGGACTC GATGAAGATG GGCGCGGTGG ACTACATCGC CAAGCCCTTC 300GATCACGACG AGATGCTCCA GGCCGTGGCG CGTATCCTGC GCGATCACCA GGAGGCCAAG 360CGCAACCCGC CAAGCGAGGC GCCCAGCAAG TCCGCCGGCA AGGGCAACGG CGCCACCGCC 420GAGGGCGAGA TCGGCATCAT CGGCTCCTGC GCCGCCATGC AGGACCTTTA CGGCAAGATC 480CGCAAGGTCG CTCCCACCGA TTCCAACGTA CTGATCCAGG GCGAGTCCGG CACCGGCAAG 540GAGCTGGTCG CGCGTGCGCT GCACAACCTC TCGCGTCGCG CCAAGGCACC GCTGATCTCG 600GTGAACTGCG CGGCCATCCC CGAGACCCTG ATCGAGTCCG AACTGTTCGG CCACGAGAAA 660GGTGCCTTCA CCGGCGCCAG CGCCGGCCGC GCCGGCCTGG TCGAAGCGGC CGACGGCGGC 720ACCCTGTTCC TCGACGAGAT CGGCGAGCTG CCGCTGGAGG CGCAGGCCCG CCTGCTGCGC 780GTGCTGCAGG AGGGCGAGAT CCGTCGGGTC GGCTCGGTGC AGTCACAGAA GGTCGATGTA 840CGCCTGATCG CCGCTACCCA CCGCGACCTC AAGACGCTGG CCAAGACCGG CCAGTTCCGC 900GAGGACCTCT ACTACCGCCT GCACGTCATC GCCCTCAAGC TGCCGCCACT GCGCGAGCGC 960GGCGCCGACG TCAACGAGAT CGCCCGCGCC TTCCTCGTCC GCCAGTGCCA GCGCATGGGC 1020CGCGAGGACC TGCGCTTCGC TCAGGATGCC GAGCAGGCGA TCCGCCACTA CCCCTGGCCG 1080GGCAACGTGC GCGAGCTGGA GAATGCCATC GAGCGCGCGG TGATCCTCTG CGAGGGCGCG 1140GAAATTTCCG CCGAGCTGCT GGGCATCGAC ATCGAGCTGG ACGACCTGGA GGACGGCGAC 1200TTCGGCGAAC AGCCACAGCA GACCGCGGCC AACCACGAAC CGACCGAGGA CCTGTCGCTG 1260GAGGACTACT TCCAGCACTT CGTACTGGAG CACCAGGATC ACATGACCGA GACCGAACTG 1320GCGCGCAAGC TCGGCATCAG CCGCAAGTGC CTGTGGGAGC GCCGTCAGCG CCTGGGCATT 1380CCGCGGCGCA AGTCGGGCGC GGCGACCGGC TCCTGA 1416 471 amino acids amino acidsingle linear unknown 4 Met Pro His Ile Leu Ile Val Glu Asp Glu Thr IleIle Arg Ser Ala 1 5 10 15 Leu Arg Arg Leu Leu Glu Arg Asn Gln Tyr GlnVal Ser Glu Ala Gly 20 25 30 Ser Val Gln Glu Ala Gln Glu Arg Tyr Ser IlePro Thr Phe Asp Leu 35 40 45 Val Val Ser Asp Leu Arg Leu Pro Gly Ala ProGly Thr Glu Leu Ile 50 55 60 Lys Leu Ala Asp Gly Thr Pro Val Leu Ile MetThr Ser Tyr Ala Ser 65 70 75 80 Leu Arg Ser Ala Val Asp Ser Met Lys MetGly Ala Val Asp Tyr Ile 85 90 95 Ala Lys Pro Phe Asp His Asp Glu Met LeuGln Ala Val Ala Arg Ile 100 105 110 Leu Arg Asp His Gln Glu Ala Lys ArgAsn Pro Pro Ser Glu Ala Pro 115 120 125 Ser Lys Ser Ala Gly Lys Gly AsnGly Ala Thr Ala Glu Gly Glu Ile 130 135 140 Gly Ile Ile Gly Ser Cys AlaAla Met Gln Asp Leu Tyr Gly Lys Ile 145 150 155 160 Arg Lys Val Ala ProThr Asp Ser Asn Val Leu Ile Gln Gly Glu Ser 165 170 175 Gly Thr Gly LysGlu Leu Val Ala Arg Ala Leu His Asn Leu Ser Arg 180 185 190 Arg Ala LysAla Pro Leu Ile Ser Val Asn Cys Ala Ala Ile Pro Glu 195 200 205 Thr LeuIle Glu Ser Glu Leu Phe Gly His Glu Lys Gly Ala Phe Thr 210 215 220 GlyAla Ser Ala Gly Arg Ala Gly Leu Val Glu Ala Ala Asp Gly Gly 225 230 235240 Thr Leu Phe Leu Asp Glu Ile Gly Glu Leu Pro Leu Glu Ala Gln Ala 245250 255 Arg Leu Leu Arg Val Leu Gln Glu Gly Glu Ile Arg Arg Val Gly Ser260 265 270 Val Gln Ser Gln Lys Val Asp Val Arg Leu Ile Ala Ala Thr HisArg 275 280 285 Asp Leu Lys Thr Leu Ala Lys Thr Gly Gln Phe Arg Glu AspLeu Tyr 290 295 300 Tyr Arg Leu His Val Ile Ala Leu Lys Leu Pro Pro LeuArg Glu Arg 305 310 315 320 Gly Ala Asp Val Asn Glu Ile Ala Arg Ala PheLeu Val Arg Gln Cys 325 330 335 Gln Arg Met Gly Arg Glu Asp Leu Arg PheAla Gln Asp Ala Glu Gln 340 345 350 Ala Ile Arg His Tyr Pro Trp Pro GlyAsn Val Arg Glu Leu Glu Asn 355 360 365 Ala Ile Glu Arg Ala Val Ile LeuCys Glu Gly Ala Glu Ile Ser Ala 370 375 380 Glu Leu Leu Gly Ile Asp IleGlu Leu Asp Asp Leu Glu Asp Gly Asp 385 390 395 400 Phe Gly Glu Gln ProGln Gln Thr Ala Ala Asn His Glu Pro Thr Glu 405 410 415 Asp Leu Ser LeuGlu Asp Tyr Phe Gln His Phe Val Leu Glu His Gln 420 425 430 Asp His MetThr Glu Thr Glu Leu Ala Arg Lys Leu Gly Ile Ser Arg 435 440 445 Lys CysLeu Trp Glu Arg Arg Gln Arg Leu Gly Ile Pro Arg Arg Lys 450 455 460 SerGly Ala Ala Thr Gly Ser 465 470 19 base pairs nucleic acid single linearunknown 5 GCCTGGAGGA TTACCAGTC 19 1512 base pairs nucleic acid singlelinear unknown 6 ATGTCCACCG ATACCCACGC CGCCCTGACG GCTCCCGCAA GCCCCGCCTTGCGCCCGCTG 60 CCCTTCGCCT TCGCCAAACG CCACGGCGTG CTGCTGCGCG AGCCCTTCGGCCAGGTCCAG 120 CTGCAGGTGC GCCGCGGTGC CAGCCTGGCC GCCGTGCAGG AGGCCCAGCGCTTCGCCGGC 180 CGCGTGCTGC CGCTGCACTG GCTGGAGCCC GAGGCCTTCG AGCAGGAGCTGGCCCTGGCC 240 TACCAGCGCG ACTCCTCCGA GGTGCGGCAG ATGGCCGAGG GCATGGGTGCCGAACTTGAC 300 CTAGCCAGCC TGGCCGAACT CACTCCCGAA TCCGGCGACC TGCTGGAGCAGGAAGATGAC 360 GCGCCGATCA TCCGCCTGAT CAACGCCATC CTCAGCGAGG CGATCAAGGCCGGCGCCTCC 420 GACATCCACC TGGAAACCTT CGAGAAACGC CTGGTGGTGC GCTTTCGCGTCGACGGCATC 480 CTCCGCGAAG TGATCGAACC GCGCCGCGAG CTGGCGGCGC TGCTGGTCTCGCGGGTCAAG 540 GTCATGGCGC GCCTGGACAT CGCCGAGAAG CGCGTACCGC AGGACGGCCGTATTTCGCTC 600 AAGGTCGGCG GTCGCGAGGT GGATATCCGC GTCTCCACCC TGCCGTCGGCCAACGGCGAG 660 CGGGTGGTGC TGCGTCTGCT CGACAAGCAG GCCGGGCGCC TGTCGCTCACGCATCTGGGC 720 ATGAGCGAGC GCGACCGCCG CCTGCTCGAC GACAACCTGC GCAAGCCGCACGGCATCATC 780 CTAGTCACCG GCCCCACCGG CTCGGGCAAG ACCACCACCC TGTACGCCGGCCTGGTCACC 840 CTCAACGACC GCTCGCGCAA TATCCTCACG GTGGAAGACC CGATCGAGTACTACCTGGAA 900 GGCATCGGCC AGACCCAGGT CAACCCGCGG GTGGACATGA CCTTCGCCCGCGGCCTGCGC 960 GCCATCCTGC GCCAGGACCC GGACGTGGTG ATGGTCGGCG AGATCCGCGACCAGGAGACC 1020 GCCGACATCG CCGTGCAGGC CTCGCTCACC GGCCACCTGG TGCTCTCCACCCTGCACACC 1080 AACAGCGCCG TCGGCGCCGT CACCCGCCTG GTCGACATGG GCGTCGAGCCCTTCCTGCTG 1140 TCGTCGTCCC TGCTCGGCGT GCTGGCCCAG CGCCTGGTGC GCGTGCTCTGCGTGCACTGC 1200 CGCGAGGCGC GCCCGGCTGA CGCGGCCGAG TGCGGCCTGC TCGGCCTCGACCCGCACAGC 1260 CAGCCCCTGA TCTACCACGC CAAGGGCTGC CCGGAGTGCC ACCAGCAGGGCTACCGCGGC 1320 CGTACTGGCA TCTACGAGCT GGTGATCTTC GACGACCAGA TGCGCACCCTGGTGCACAAC 1380 GGCGCCGGTG AGCAGGAGCT GATTCGCCAC GCCCGCAGCC TCGGCCCGAGCATCCGCGAC 1440 GATGGCCGGC GCAAGGTGCT GGAAGGGGTG ACCAGCCTGG AAGAAGTGTTGCGCGTGACC 1500 CGGGAAGACT GA 1512 503 amino acids amino acid singlelinear unknown 7 Met Ser Thr Asp Thr His Ala Ala Leu Thr Ala Pro Ala SerPro Ala 1 5 10 15 Leu Arg Pro Leu Pro Phe Ala Phe Ala Lys Arg His GlyVal Leu Leu 20 25 30 Arg Glu Pro Phe Gly Gln Val Gln Leu Gln Val Arg ArgGly Ala Ser 35 40 45 Leu Ala Ala Val Gln Glu Ala Gln Arg Phe Ala Gly ArgVal Leu Pro 50 55 60 Leu His Trp Leu Glu Pro Glu Ala Phe Glu Gln Glu LeuAla Leu Ala 65 70 75 80 Tyr Gln Arg Asp Ser Ser Glu Val Arg Gln Met AlaGlu Gly Met Gly 85 90 95 Ala Glu Leu Asp Leu Ala Ser Leu Ala Glu Leu ThrPro Glu Ser Gly 100 105 110 Asp Leu Leu Glu Gln Glu Asp Asp Ala Pro IleIle Arg Leu Ile Asn 115 120 125 Ala Ile Leu Ser Glu Ala Ile Lys Ala GlyAla Ser Asp Ile His Leu 130 135 140 Glu Thr Phe Glu Lys Arg Leu Val ValArg Phe Arg Val Asp Gly Ile 145 150 155 160 Leu Arg Glu Val Ile Glu ProArg Arg Glu Leu Ala Ala Leu Leu Val 165 170 175 Ser Arg Val Lys Val MetAla Arg Leu Asp Ile Ala Glu Lys Arg Val 180 185 190 Pro Gln Asp Gly ArgIle Ser Leu Lys Val Gly Gly Arg Glu Val Asp 195 200 205 Ile Arg Val SerThr Leu Pro Ser Ala Asn Gly Glu Arg Val Val Leu 210 215 220 Arg Leu LeuAsp Lys Gln Ala Gly Arg Leu Ser Leu Thr His Leu Gly 225 230 235 240 MetSer Glu Arg Asp Arg Arg Leu Leu Asp Asp Asn Leu Arg Lys Pro 245 250 255His Gly Ile Ile Leu Val Thr Gly Pro Thr Gly Ser Gly Lys Thr Thr 260 265270 Thr Leu Tyr Ala Gly Leu Val Thr Leu Asn Asp Arg Ser Arg Asn Ile 275280 285 Leu Thr Val Glu Asp Pro Ile Glu Tyr Tyr Leu Glu Gly Ile Gly Gln290 295 300 Thr Gln Val Asn Pro Arg Val Asp Met Thr Phe Ala Arg Gly LeuArg 305 310 315 320 Ala Ile Leu Arg Gln Asp Pro Asp Val Val Met Val GlyGlu Ile Arg 325 330 335 Asp Gln Glu Thr Ala Asp Ile Ala Val Gln Ala SerLeu Thr Gly His 340 345 350 Leu Val Leu Ser Thr Leu His Thr Asn Ser AlaVal Gly Ala Val Thr 355 360 365 Arg Leu Val Asp Met Gly Val Glu Pro PheLeu Leu Ser Ser Ser Leu 370 375 380 Leu Gly Val Leu Ala Gln Arg Leu ValArg Val Leu Cys Val His Cys 385 390 395 400 Arg Glu Ala Arg Pro Ala AspAla Ala Glu Cys Gly Leu Leu Gly Leu 405 410 415 Asp Pro His Ser Gln ProLeu Ile Tyr His Ala Lys Gly Cys Pro Glu 420 425 430 Cys His Gln Gln GlyTyr Arg Gly Arg Thr Gly Ile Tyr Glu Leu Val 435 440 445 Ile Phe Asp AspGln Met Arg Thr Leu Val His Asn Gly Ala Gly Glu 450 455 460 Gln Glu LeuIle Arg His Ala Arg Ser Leu Gly Pro Ser Ile Arg Asp 465 470 475 480 AspGly Arg Arg Lys Val Leu Glu Gly Val Thr Ser Leu Glu Glu Val 485 490 495Leu Arg Val Thr Arg Glu Asp 500 1215 base pairs nucleic acid singlelinear unknown 8 ATGGCCGCCT TCGAATACAT CGCCCTGGAT GCCAGGGGCC GCCAGCAGAAGGGCGTGCTG 60 GAGGGCGACA GCGCCCGCCA GGTGCGCCAG CTGCTGCGCG ACAAACAGTTGTCGCCGCTG 120 CAGGTCGAGC CGGTACAGCG CAGGGAGCAG GCCGAGGCTG GTGGCTTCAGCCTGCGCCGT 180 GGCCTGTCGG CGCGCGACCT GGCGCTGGTC ACCCGTCAGC TGGCGACCCTGATCGGCGCC 240 GCGCTGCCCA TCGAGGAAGC GCTGCGCGCC GCCGCCGCGC AGTCGCGCCAGCCGCGCATC 300 CAGTCGATGC TGTTGGCGGT GCGCGCCAAG GTGCTCGAGG GCCACAGCCTGGCCAAGGCC 360 CTGGCCTCCT ACCCGGCGGC CTTCCCCGAG CTGTACCGCG CCACGGTGGCGGCCGGCGAG 420 CATGCGGGGC ACCTGGCGCC GGTGCTGGAG CAGCTGGCCG ACTACACCGAGCAGCGCCAG 480 CAGTCGCGGC AGAAGATCCA GATGGCGCTG CTCTACCCGG TGATCCTGATGCTCGCTTCG 540 CTGGGCATCG TCGGTTTTCT GCTCGGCTAC GTGGTGCCGG ATGTGGTGCGGGTGTTCGTC 600 GACTCCGGGC AGACCCTGCC GGCGCTGACC CGCGGGCTGA TTTTCCTCAGCGAGCTGGTC 660 AAGTCCTGGG GCGCCCTGGC CATCGTCCTG GCGGTGCTCG GCGTGCTCGCCTTTCGCCGC 720 GCCTTGCGCA GCGAGGATCT GCGCCGGCGC TGGCATGCCT TCCTGCTGCGCGTGCCGCTG 780 GTCGGTGGGC TGATCGCCGC CACCGAGACG GCACGCTTCG CCTCGACCCTGGCCATCCTG 840 GTGCGCAGCG GCGTGCCACT GGTGGAGGCG CTGGCCATCG GCGCCGAGGTGGTGTCCAAC 900 CTGATCATCC GCAGCGACGT GGCCAACGCC ACCCAGCGCG TGCGCGAGGGCGGCAGCCTG 960 TCGCGCGCGC TGGAAGCCAG CCGGCAGTTT CCGCCGATGA TGCTGCACATGATCGCCAGC 1020 GGCGAGCGTT CCGGCGAGCT GGACCAGATG CTGGCGCGCA CGGCGCGCAACCAGGAAAAC 1080 GACCTGGCGG CCACCATCGG CCTGCTGGTG GGGCTGTTCG AGCCGTTCATGCTGGTATTC 1140 ATGGGCGCGG TGGTGCTGGT GATCGTGCTG GCCATCCTGC TGCCGATTCTTTCTCTGAAC 1200 CAACTGGTGG GTTGA 1215 404 amino acids amino acid singlelinear unknown 9 Met Ala Ala Phe Glu Tyr Ile Ala Leu Asp Ala Arg Gly ArgGln Gln 1 5 10 15 Lys Gly Val Leu Glu Gly Asp Ser Ala Arg Gln Val ArgGln Leu Leu 20 25 30 Arg Asp Lys Gln Leu Ser Pro Leu Gln Val Glu Pro ValGln Arg Arg 35 40 45 Glu Gln Ala Glu Ala Gly Gly Phe Ser Leu Arg Arg GlyLeu Ser Ala 50 55 60 Arg Asp Leu Ala Leu Val Thr Arg Gln Leu Ala Thr LeuIle Gly Ala 65 70 75 80 Ala Leu Pro Ile Glu Glu Ala Leu Arg Ala Ala AlaAla Gln Ser Arg 85 90 95 Gln Pro Arg Ile Gln Ser Met Leu Leu Ala Val ArgAla Lys Val Leu 100 105 110 Glu Gly His Ser Leu Ala Lys Ala Leu Ala SerTyr Pro Ala Ala Phe 115 120 125 Pro Glu Leu Tyr Arg Ala Thr Val Ala AlaGly Glu His Ala Gly His 130 135 140 Leu Ala Pro Val Leu Glu Gln Leu AlaAsp Tyr Thr Glu Gln Arg Gln 145 150 155 160 Gln Ser Arg Gln Lys Ile GlnMet Ala Leu Leu Tyr Pro Val Ile Leu 165 170 175 Met Leu Ala Ser Leu GlyIle Val Gly Phe Leu Leu Gly Tyr Val Val 180 185 190 Pro Asp Val Val ArgVal Phe Val Asp Ser Gly Gln Thr Leu Pro Ala 195 200 205 Leu Thr Arg GlyLeu Ile Phe Leu Ser Glu Leu Val Lys Ser Trp Gly 210 215 220 Ala Leu AlaIle Val Leu Ala Val Leu Gly Val Leu Ala Phe Arg Arg 225 230 235 240 AlaLeu Arg Ser Glu Asp Leu Arg Arg Arg Trp His Ala Phe Leu Leu 245 250 255Arg Val Pro Leu Val Gly Gly Leu Ile Ala Ala Thr Glu Thr Ala Arg 260 265270 Phe Ala Ser Thr Leu Ala Ile Leu Val Arg Ser Gly Val Pro Leu Val 275280 285 Glu Ala Leu Ala Ile Gly Ala Glu Val Val Ser Asn Leu Ile Ile Arg290 295 300 Ser Asp Val Ala Asn Ala Thr Gln Arg Val Arg Glu Gly Gly SerLeu 305 310 315 320 Ser Arg Ala Leu Glu Ala Ser Arg Gln Phe Pro Pro MetMet Leu His 325 330 335 Met Ile Ala Ser Gly Glu Arg Ser Gly Glu Leu AspGln Met Leu Ala 340 345 350 Arg Thr Ala Arg Asn Gln Glu Asn Asp Leu AlaAla Thr Ile Gly Leu 355 360 365 Leu Val Gly Leu Phe Glu Pro Phe Met LeuVal Phe Met Gly Ala Val 370 375 380 Val Leu Val Ile Val Leu Ala Ile LeuLeu Pro Ile Leu Ser Leu Asn 385 390 395 400 Gln Leu Val Gly 423 basepairs nucleic acid single linear unknown 10 ATGTACAAAC AGAAAGGCTTCACGCTGATC GAAATCATGG TGGTGGTGGT CATCCTCGGC 60 ATTCTCGCTG CCCTGGTGGTGCCGCAGGTG ATGGGCCGCC CGGACCAGGC CAAGGTCACC 120 GCGGCGCAGA ACGACATCCGCGCCATCGGC GCCGCGCTGG ACATGTACAA GCTGGACAAC 180 CAGAACTACC CGAGCACCCAGCAGGGCCTG GAGGCCCTGG TGAAGAAACC CACCGGCACG 240 CCGGCGGCGA AGAACTGGAACGCCGAGGGC TACCTGAAGA AGCTGCCGGT CGACCCCTGG 300 GGCAACCAGT ACCTGTACCTGTCGCCGGGC ACCCGCGGCA AGATCGACCT GTATTCGCTG 360 GGCGCCGACG GCCAGGAAGGCGGCGAGGGG ACCGACGCCG ACATCGGCAA CTGGGATCTC 420 TGA 423 140 amino acidsamino acid single linear unknown 11 Met Tyr Lys Gln Lys Gly Phe Thr LeuIle Glu Ile Met Val Val Val 1 5 10 15 Val Ile Leu Gly Ile Leu Ala AlaLeu Val Val Pro Gln Val Met Gly 20 25 30 Arg Pro Asp Gln Ala Lys Val ThrAla Ala Gln Asn Asp Ile Arg Ala 35 40 45 Ile Gly Ala Ala Leu Asp Met TyrLys Leu Asp Asn Gln Asn Tyr Pro 50 55 60 Ser Thr Gln Gln Gly Leu Glu AlaLeu Val Lys Lys Pro Thr Gly Thr 65 70 75 80 Pro Ala Ala Lys Asn Trp AsnAla Glu Gly Tyr Leu Lys Lys Leu Pro 85 90 95 Val Asp Pro Trp Gly Asn GlnTyr Leu Tyr Leu Ser Pro Gly Thr Arg 100 105 110 Gly Lys Ile Asp Leu TyrSer Leu Gly Ala Asp Gly Gln Glu Gly Gly 115 120 125 Glu Gly Thr Asp AlaAsp Ile Gly Asn Trp Asp Leu 130 135 140 642 base pairs nucleic acidsingle linear unknown 12 TTGAGTAGCA CCCGCACCCG CCTGCCCGCC TGGCTGCAGCGCCACGGCGT GACCGGCCTC 60 TGCCTGCTCG TGGTGCTGCT CATCACCCTC AGCCTGAGCAAGCAGAGCAT CGACTTCCTT 120 CGCCTGCTGC GCAGCGAGGC CGCGCCACCG CCCGCCCCAGAGAGCATCGC CGAGCGCCAG 180 CCGCTGTCCA TCCAGCGCCT GCAGCATCTG TTCGGCACGCCCGCGGCCAG GCCGCGCGGC 240 GACCAGGCCG CCCCCGCCAC CCGGCAGCAG ATGACCCTGCTGGCCAGCTT CGTCAACCCG 300 GACGCCAAGC GCTCCACGGC GATCATCCAG GTCGCCGGCGACAAACCCAA GCGCATCGCC 360 GTGGGCGAAT CGGTCAACGT CAGCACCCGC CTGCAGGCCGTCTATCAGGA CCACGTGGTG 420 CTCGACCGCG GCGGCGTCGA GGAGAGCCTG CGCTTCCCCGCCGTGCGCCA GCCCTCTCTG 480 ACGCCGGCCT ACTCGGCGCT GGAGCCCACC GCCAGCCAACTGGAACAGCT GCAGGACGAA 540 GACGTCCAGG CCCTGCAGGA GCGCATCCAG ACCCTTCAACAACGCATGGA AGGCGGCGAC 600 ATCCCGCAGC CCGAAATACC GGAAGCCGAA GACAGCCCAT GA642 213 amino acids amino acid single linear unknown 13 Met Ser Ser ThrArg Thr Arg Leu Pro Ala Trp Leu Gln Arg His Gly 1 5 10 15 Val Thr GlyLeu Cys Leu Leu Val Val Leu Leu Ile Thr Leu Ser Leu 20 25 30 Ser Lys GlnSer Ile Asp Phe Leu Arg Leu Leu Arg Ser Glu Ala Ala 35 40 45 Pro Pro ProAla Pro Glu Ser Ile Ala Glu Arg Gln Pro Leu Ser Ile 50 55 60 Gln Arg LeuGln His Leu Phe Gly Thr Pro Ala Ala Arg Pro Arg Gly 65 70 75 80 Asp GlnAla Ala Pro Ala Thr Arg Gln Gln Met Thr Leu Leu Ala Ser 85 90 95 Phe ValAsn Pro Asp Ala Lys Arg Ser Thr Ala Ile Ile Gln Val Ala 100 105 110 GlyAsp Lys Pro Lys Arg Ile Ala Val Gly Glu Ser Val Asn Val Ser 115 120 125Thr Arg Leu Gln Ala Val Tyr Gln Asp His Val Val Leu Asp Arg Gly 130 135140 Gly Val Glu Glu Ser Leu Arg Phe Pro Ala Val Arg Gln Pro Ser Leu 145150 155 160 Thr Pro Ala Tyr Ser Ala Leu Glu Pro Thr Ala Ser Gln Leu GluGln 165 170 175 Leu Gln Asp Glu Asp Val Gln Ala Leu Gln Glu Arg Ile GlnThr Leu 180 185 190 Gln Gln Arg Met Glu Gly Gly Asp Ile Pro Gln Pro GluIle Pro Glu 195 200 205 Ala Glu Asp Ser Pro 210 1950 base pairs nucleicacid single linear unknown 14 ATGATCGACT CCAGAATTCC GCCGCACAAACGCCTGCCCC TCGCCCTGCT GCTGGCCGCG 60 AGCTGCCTCG CCGCCCCGCT GCCGCTCGTCCATGCCGCCG AGCCGGTGGC GGTGAGCCAG 120 GGCGCCGAGA CCTGGACCAT CAACATGAAGGACGCCGATA TCCGCGACTT CATCGACCAG 180 GTGGCGCAGA TCTCTGGCGA GACCTTCGTCGTCGATCCGC GGGTCAAGGG CCAGGTCACG 240 GTGATCTCCA AGACCCCGCT GGGCCTCGAGGAGGTCTACC AGCTGTTCCT TTCGGTGATG 300 AGCACCCATG GCTTCAGCGT GCTGGCACAGGGCGACCAGG CGCGCATCGT GCCGGTCACC 360 GAGGCGCGTA GCGGCGCCAA CAGCAGCCGCAGCGCGCCGG ACGATGTGCA GACCGAGCTG 420 ATCCAGGTGC AGCACACCTC GGTCAACGAACTGATCCCGC TGATCCGCCC GCTGGTGCCG 480 CAGAACGGCC ACCTGGCGGC GGTCGCCGCCTCCAACGCGC TGATCATCAG CGACCGCCGG 540 GCNAATATCG AACGCATCCG CGAACTGATCGCCGAGCTCG ATGCCCAGGG CGGCGGCGAC 600 TACAACGTGA TCAACCTGCA GCATGCCTGGGTACTGGACG CCGCCGAGGC ACTGAACAAC 660 GCGGTGATGC GCAACGAGAA AAACAGCGCCGGCACCCGGG TGATTGCCGA CGCCCGCACC 720 AACCGCCTGA TCCTCCTCGG CCCGCCGGCCGCCCGCCAGC GCCTGGCCAA CCTGGCCCGC 780 TCGCTGGACA TCCCCAGCAC CCGTTCGGCCAATGCGCGGG TAATTCGCCT ACGCCACAGC 840 GACGCCAAGA GCCTGGCCGA GACCCTGGGCGACATCTCCG AGGGGTTGAA GACCGCGGAG 900 GGTGGTGGCG AAGCCGCCAG CAGCAAGCCGCAGAACATCC TGATCCGCGC CGACGAGAGC 960 CTCAATGCCC TGGTCCTGCT GGCCGATCCGGACACCGTGG CGACCCTCGA GGAAATCGTG 1020 CGCAACCTCG ACGTGCCGCG CGCCCAGGTGATGGTCGAGG CGGCCATCGT GGAAATCTCC 1080 GGGGACATCA GCGACGCCCT CGGCGTGCAGTGGGCGGTGG ATGCCCGCGG CGGCACCGGC 1140 GGCCTCGGCG GGGTCAACTT CGGCAATACCGGGCTATCGG TGGGCACCGT GCTCAAGGCC 1200 ATCCAGAACG AGGAAATCCC CGATGACCTGACCCTGCCGG ACGGCGCCAT CATCGGCATC 1260 GGCACCGAGA ACTTCGGCGC GCTGATCACTGCCCTCTCTG CCAACAGCAA GAGCAACCTG 1320 CTGTCCACGC CCAGCCTGCT GACCCTGGACAACCAGGAGG CGGAAATCCT GGTCGGGCAG 1380 AACGTGCCTT TCCAGACCGG CTCCTACACCACCGACGCCT CGGGGGCGAA CAACCCCTTC 1440 ACCACCATTG AGCGCGAGGA CATCGGCGTGACCCTCAAGG TCACCCCGCA CATCAACGAC 1500 GGCGCCACCC TGCGCCTGGA AGTGGAGCAGGAGATCTCCT CCATCGCCCC CAGCGCCGGG 1560 GTCAATGCCC AGGCGGTGGA CCTGGTGACCAACAAGCGCT CGATCAAGAG CGTGATCCTG 1620 GCCGACGACG GCCAGGTCAT AGTGCTGGGAGGGCTGATCC AGGACGACGT CACCAGCACC 1680 GACTCCAAGG TGCCGCTGCT GGGTGACATCCCGCTGATCG GCCGGCTGTT CCGCTCGACC 1740 AAGGACACCC ACGTCAAGCG CAACCTGATGGTGTTCCTGC GCCCGACCAT CGTCCGCGAC 1800 CGCGCCGGCA TGGCCGCGCT GTCGGGCAAGAAGTACAGCG ACATCAGCGT GCTGGGTGCC 1860 GACGAGGATG GCCACAGCAG CCTGCCGGGCAGCGCCGAGC GCCTGTTCGA CAAACCCGGC 1920 GCCGGTGCCG TGGACCTGCG CGACCAGTGA1950 649 amino acids amino acid single linear unknown 15 Met Ile Asp SerArg Ile Pro Pro His Lys Arg Leu Pro Leu Ala Leu 1 5 10 15 Leu Leu AlaAla Ser Cys Leu Ala Ala Pro Leu Pro Leu Val His Ala 20 25 30 Ala Glu ProVal Ala Val Ser Gln Gly Ala Glu Thr Trp Thr Ile Asn 35 40 45 Met Lys AspAla Asp Ile Arg Asp Phe Ile Asp Gln Val Ala Gln Ile 50 55 60 Ser Gly GluThr Phe Val Val Asp Pro Arg Val Lys Gly Gln Val Thr 65 70 75 80 Val IleSer Lys Thr Pro Leu Gly Leu Glu Glu Val Tyr Gln Leu Phe 85 90 95 Leu SerVal Met Ser Thr His Gly Phe Ser Val Leu Ala Gln Gly Asp 100 105 110 GlnAla Arg Ile Val Pro Val Thr Glu Ala Arg Ser Gly Ala Asn Ser 115 120 125Ser Arg Ser Ala Pro Asp Asp Val Gln Thr Glu Leu Ile Gln Val Gln 130 135140 His Thr Ser Val Asn Glu Leu Ile Pro Leu Ile Arg Pro Leu Val Pro 145150 155 160 Gln Asn Gly His Leu Ala Ala Val Ala Ala Ser Asn Ala Leu IleIle 165 170 175 Ser Asp Arg Arg Ala Asn Ile Glu Arg Ile Arg Glu Leu IleAla Glu 180 185 190 Leu Asp Ala Gln Gly Gly Gly Asp Tyr Asn Val Ile AsnLeu Gln His 195 200 205 Ala Trp Val Leu Asp Ala Ala Glu Ala Leu Asn AsnAla Val Met Arg 210 215 220 Asn Glu Lys Asn Ser Ala Gly Thr Arg Val IleAla Asp Ala Arg Thr 225 230 235 240 Asn Arg Leu Ile Leu Leu Gly Pro ProAla Ala Arg Gln Arg Leu Ala 245 250 255 Asn Leu Ala Arg Ser Leu Asp IlePro Ser Thr Arg Ser Ala Asn Ala 260 265 270 Arg Val Ile Arg Leu Arg HisSer Asp Ala Lys Ser Leu Ala Glu Thr 275 280 285 Leu Gly Asp Ile Ser GluGly Leu Lys Thr Ala Glu Gly Gly Gly Glu 290 295 300 Ala Ala Ser Ser LysPro Gln Asn Ile Leu Ile Arg Ala Asp Glu Ser 305 310 315 320 Leu Asn AlaLeu Val Leu Leu Ala Asp Pro Asp Thr Val Ala Thr Leu 325 330 335 Glu GluIle Val Arg Asn Leu Asp Val Pro Arg Ala Gln Val Met Val 340 345 350 GluAla Ala Ile Val Glu Ile Ser Gly Asp Ile Ser Asp Ala Leu Gly 355 360 365Val Gln Trp Ala Val Asp Ala Arg Gly Gly Thr Gly Gly Leu Gly Gly 370 375380 Val Asn Phe Gly Asn Thr Gly Leu Ser Val Gly Thr Val Leu Lys Ala 385390 395 400 Ile Gln Asn Glu Glu Ile Pro Asp Asp Leu Thr Leu Pro Asp GlyAla 405 410 415 Ile Ile Gly Ile Gly Thr Glu Asn Phe Gly Ala Leu Ile ThrAla Leu 420 425 430 Ser Ala Asn Ser Lys Ser Asn Leu Leu Ser Thr Pro SerLeu Leu Thr 435 440 445 Leu Asp Asn Gln Glu Ala Glu Ile Leu Val Gly GlnAsn Val Pro Phe 450 455 460 Gln Thr Gly Ser Tyr Thr Thr Asp Ala Ser GlyAla Asn Asn Pro Phe 465 470 475 480 Thr Thr Ile Glu Arg Glu Asp Ile GlyVal Thr Leu Lys Val Thr Pro 485 490 495 His Ile Asn Asp Gly Ala Thr LeuArg Leu Glu Val Glu Gln Glu Ile 500 505 510 Ser Ser Ile Ala Pro Ser AlaGly Val Asn Ala Gln Ala Val Asp Leu 515 520 525 Val Thr Asn Lys Arg SerIle Lys Ser Val Ile Leu Ala Asp Asp Gly 530 535 540 Gln Val Ile Val LeuGly Gly Leu Ile Gln Asp Asp Val Thr Ser Thr 545 550 555 560 Asp Ser LysVal Pro Leu Leu Gly Asp Ile Pro Leu Ile Gly Arg Leu 565 570 575 Phe ArgSer Thr Lys Asp Thr His Val Lys Arg Asn Leu Met Val Phe 580 585 590 LeuArg Pro Thr Ile Val Arg Asp Arg Ala Gly Met Ala Ala Leu Ser 595 600 605Gly Lys Lys Tyr Ser Asp Ile Ser Val Leu Gly Ala Asp Glu Asp Gly 610 615620 His Ser Ser Leu Pro Gly Ser Ala Glu Arg Leu Phe Asp Lys Pro Gly 625630 635 640 Ala Gly Ala Val Asp Leu Arg Asp Gln 645 2742 base pairsnucleic acid single linear unknown 16 ATGTCTGTTT GGGTCACGTG GCCGGGCTTGGTCAAGTTCG GCACCCTGGG CATCTATGCC 60 GGCCTGATCA CGCTCGCGCT TGAGCGCGACGTGCTGTTCA AGAACAACCT GTTCGACGTC 120 GACAACCTGC CCGCGGCCAA CGCCAGCATCACCTGTGATG CCCGCAGCCA GGTGGCGCGT 180 ACCGAGGACG GCACCTGTAA CATCCTCGCCAACCCGGCCG AGGGCTCGGT GTACCGCCGC 240 TTCGGGCGCA ACGTCGACCC CAGCGTGACCCATGGCGAGA CCGAGGCCGA CACCCTGCTC 300 AGTCCCAATC CGCGGGAGGT GAGTAACGTGCTGATGGCGC GTGGCGAGTT CAAGCCGGCG 360 CCCAGCCTCA ACTTCATCGC CGCCTCCTGGATCCAGTTCA TGGTGCATGA CTGGGTCGAA 420 CACGGCCCCA ACGCCGAAGC CAACCCGATCCAGGTGCCGC TGCCGGCTGG CGACGCGCTC 480 GGCTCCGGCA GCCTGTCCGT GCGCCGCACCCAGCCCGACC CGACCCGTAC CCCGGCCGAG 540 GCCGGCAAGC CGGCCACCTA CCGCAACCACAACACCCACT GGTGGGATGG CTCGCAGTTG 600 TATGGCAGCA GCAAGGACAT CAACGACAAGGTGCGCGCCT TCGAGGGTGG CAAGCTGAAG 660 ATCAATCCCG ACGGTACCCT GCCGACCGAGTTCCTCAGCG GCAAGCCGAT CACCGGCTTC 720 AACGAGAACT GGTGGGTTGG CCTGAGCATGCTGCACCAGC TGTTCACTAA GGAGCACAAC 780 GCCATCGCGG CGATGCTCCA GCAGAAGTACCCGGACAAGG ACGACCAGTG GCTGTACGAC 840 CATGCGCGCC TGGTCAACTC CGCGCTGATGGCCAAGATCC ACACCGTGGA ATGGACCCCG 900 GCGGTGATCG CCAACCCGGT CACCGAACGCGCCATGTATG CCAACTGGTG GGGCCTGCTG 960 GGTTCCGGTC CGGAGCGTGA CAAGTACCAGGAAGAGGCGC GCATGCTGCA GGAGGACCTG 1020 GCCAGCTCCA ACTCCTTCGT CCTGCGCATTCTCGGCATCG ACGGCAGCCA GGCCGGCAGT 1080 TCGGCCATCG ACCATGCCCT GGCCGGCATCGTCGGCTCGA CCAACCCGAA CAACTACGGC 1140 GTGCCCTACA CCCTGACCGA GGAGTTCGTCGCGGTCTACC GCATGCACCC GCTGATGCGC 1200 GACAAGGTCG ATGTCTACGA CATCGGCTCGAACATCATCG CGCGCAGCGT GCCGCTGCAG 1260 GAGACCCGCG ATGCCGACGC CGAGGAGCTGCTGGCGGACG AGAATCCCGA GCGCCTGTGG 1320 TACTCCTTCG GCATCACCAA CCCGGGCTCGCTGACCCTCA ACAACTACCC GAACTTCCTG 1380 CGCAACCTGT CCATGCCGCT GGTCGGCAACATCGACCTGG CGACCATCGA CGTGCTGTGT 1440 GACCGCGAGC GCGGGGTGCC GCGCTACAACGAGTTCCGCC GCGAGATCGG CCTCAACCCG 1500 ATCACCAAGT TGGAGGACCT GACCACCGACCCGGCCACCC TGGCCAACCT CAAGCGCATC 1560 TACGGCAACG ACATCGAGAA GATTGACACCCTGGTCGGCA TGCTGGCCGA GACCGTGCGT 1620 CCGGACGGCT TCGCCTTCGG CGAGACGGCCTTCCAGATCT TCATCATGAA CGCCTCGCGG 1680 CGCCTGATGA CCGACCGCTT CTATACCAAGGACTACCGCC CGGAGATCTA CACCGCCGAG 1740 GGCCTGGCCT GGGTCGAGAA CACCACCATGGTCGACGTGC TCAAACGCCA CAATCCGCAG 1800 CTGGTCAACA GCCTGGTTGG CGTGGAAAACGCCTTCAAAC CCTGGGGCCT GAACATCCCG 1860 GCCGACTACG AGAGCTGGCC GGGCAAGGCCAAGCAGGACA ACCTGTGGGT CAACGGCGCC 1920 NTGCGCACCC AGTACGCCGC AGGCCAGCTGCCGGCCATTC CGCCGGTGGA CGTCGGCGGC 1980 CTGATCAGTT CGGTGCTGTG GAAGAAGGTGCAGACCAANT CCGACGTGGC GCCGGCCGGC 2040 TACGAGAAGG CCATGCACCC GCATGGCGTGATGGCCAAGG TCAAGTTCAC CGCCGTGCCG 2100 GGGCACCCCT ACACCGGCCT GTTCCAGGGTGCCGACAGCG GCCTGCTGCG CCTGTCGGTG 2160 GCCGGCGACC CGGCAACCAA CGGCTTCCAGCCGGGTCTGG CGTGGAAGGC CTTCGTCGAC 2220 GGCAAGCCGT CGCAGAACGT CTCCGCGCTCTACACCCTGA GCGGGCAGGG CAGCAACCAC 2280 AACTTCTTCG CCAACGAGCT GTCGCAGTTCGTCCTGCCGG AGACCAACGA TACCCTGGGC 2340 ACCACGCTGC TGTTCTCGCT GGTCAGCCTCAAGCCGACCT TGCTGCGCGT GGACGACATG 2400 GCCGAAGTGA CCCAGACCGG CCAGGCCGTGACTTCGGTCA AGGCGCCGAC GCAGATCTAC 2460 TTCGTGCCCA AGCCGGAGCT GCGCAGCCTGTTCTCCAGTG CGGCGCATGA CTTCCGCAGC 2520 GACCTGACGA GCCTCACCGC CGGCACCAAGCTGTACGACG TCTACGCTAC CTCGATGGAG 2580 ATCAAGACCT CGATCCTGCC GTCGACCAATCGTAGCTACG CCCAGCAACG GCGCAACAGC 2640 GCGGTGAAGA TCGGCGAGAT GGAGCTGACCTCGCCGTTCA TCGCCTCGGC CTTCGGCGAC 2700 AACGGGGTGT TCTTCAAGCA CCAGCGTCACGAAGACAAAT AA 2742 913 amino acids amino acid single linear unknown 17Met Ser Val Trp Val Thr Trp Pro Gly Leu Val Lys Phe Gly Thr Leu 1 5 1015 Gly Ile Tyr Ala Gly Leu Ile Thr Leu Ala Leu Glu Arg Asp Val Leu 20 2530 Phe Lys Asn Asn Leu Phe Asp Val Asp Asn Leu Pro Ala Ala Asn Ala 35 4045 Ser Ile Thr Cys Asp Ala Arg Ser Gln Val Ala Arg Thr Glu Asp Gly 50 5560 Thr Cys Asn Ile Leu Ala Asn Pro Ala Glu Gly Ser Val Tyr Arg Arg 65 7075 80 Phe Gly Arg Asn Val Asp Pro Ser Val Thr His Gly Glu Thr Glu Ala 8590 95 Asp Thr Leu Leu Ser Pro Asn Pro Arg Glu Val Ser Asn Val Leu Met100 105 110 Ala Arg Gly Glu Phe Lys Pro Ala Pro Ser Leu Asn Phe Ile AlaAla 115 120 125 Ser Trp Ile Gln Phe Met Val His Asp Trp Val Glu His GlyPro Asn 130 135 140 Ala Glu Ala Asn Pro Ile Gln Val Pro Leu Pro Ala GlyAsp Ala Leu 145 150 155 160 Gly Ser Gly Ser Leu Ser Val Arg Arg Thr GlnPro Asp Pro Thr Arg 165 170 175 Thr Pro Ala Glu Ala Gly Lys Pro Ala ThrTyr Arg Asn His Asn Thr 180 185 190 His Trp Trp Asp Gly Ser Gln Leu TyrGly Ser Ser Lys Asp Ile Asn 195 200 205 Asp Lys Val Arg Ala Phe Glu GlyGly Lys Leu Lys Ile Asn Pro Asp 210 215 220 Gly Thr Leu Pro Thr Glu PheLeu Ser Gly Lys Pro Ile Thr Gly Phe 225 230 235 240 Asn Glu Asn Trp TrpVal Gly Leu Ser Met Leu His Gln Leu Phe Thr 245 250 255 Lys Glu His AsnAla Ile Ala Ala Met Leu Gln Gln Lys Tyr Pro Asp 260 265 270 Lys Asp AspGln Trp Leu Tyr Asp His Ala Arg Leu Val Asn Ser Ala 275 280 285 Leu MetAla Lys Ile His Thr Val Glu Trp Thr Pro Ala Val Ile Ala 290 295 300 AsnPro Val Thr Glu Arg Ala Met Tyr Ala Asn Trp Trp Gly Leu Leu 305 310 315320 Gly Ser Gly Pro Glu Arg Asp Lys Tyr Gln Glu Glu Ala Arg Met Leu 325330 335 Gln Glu Asp Leu Ala Ser Ser Asn Ser Phe Val Leu Arg Ile Leu Gly340 345 350 Ile Asp Gly Ser Gln Ala Gly Ser Ser Ala Ile Asp His Ala LeuAla 355 360 365 Gly Ile Val Gly Ser Thr Asn Pro Asn Asn Tyr Gly Val ProTyr Thr 370 375 380 Leu Thr Glu Glu Phe Val Ala Val Tyr Arg Met His ProLeu Met Arg 385 390 395 400 Asp Lys Val Asp Val Tyr Asp Ile Gly Ser AsnIle Ile Ala Arg Ser 405 410 415 Val Pro Leu Gln Glu Thr Arg Asp Ala AspAla Glu Glu Leu Leu Ala 420 425 430 Asp Glu Asn Pro Glu Arg Leu Trp TyrSer Phe Gly Ile Thr Asn Pro 435 440 445 Gly Ser Leu Thr Leu Asn Asn TyrPro Asn Phe Leu Arg Asn Leu Ser 450 455 460 Met Pro Leu Val Gly Asn IleAsp Leu Ala Thr Ile Asp Val Leu Cys 465 470 475 480 Asp Arg Glu Arg GlyVal Pro Arg Tyr Asn Glu Phe Arg Arg Glu Ile 485 490 495 Gly Leu Asn ProIle Thr Lys Leu Glu Asp Leu Thr Thr Asp Pro Ala 500 505 510 Thr Leu AlaAsn Leu Lys Arg Ile Tyr Gly Asn Asp Ile Glu Lys Ile 515 520 525 Asp ThrLeu Val Gly Met Leu Ala Glu Thr Val Arg Pro Asp Gly Phe 530 535 540 AlaPhe Gly Glu Thr Ala Phe Gln Ile Phe Ile Met Asn Ala Ser Arg 545 550 555560 Arg Leu Met Thr Asp Arg Phe Tyr Thr Lys Asp Tyr Arg Pro Glu Ile 565570 575 Tyr Thr Ala Glu Gly Leu Ala Trp Val Glu Asn Thr Thr Met Val Asp580 585 590 Val Leu Lys Arg His Asn Pro Gln Leu Val Asn Ser Leu Val GlyVal 595 600 605 Glu Asn Ala Phe Lys Pro Trp Gly Leu Asn Ile Pro Ala AspTyr Glu 610 615 620 Ser Trp Pro Gly Lys Ala Lys Gln Asp Asn Leu Trp ValAsn Gly Ala 625 630 635 640 Xaa Arg Thr Gln Tyr Ala Ala Gly Gln Leu ProAla Ile Pro Pro Val 645 650 655 Asp Val Gly Gly Leu Ile Ser Ser Val LeuTrp Lys Lys Val Gln Thr 660 665 670 Xaa Ser Asp Val Ala Pro Ala Gly TyrGlu Lys Ala Met His Pro His 675 680 685 Gly Val Met Ala Lys Val Lys PheThr Ala Val Pro Gly His Pro Tyr 690 695 700 Thr Gly Leu Phe Gln Gly AlaAsp Ser Gly Leu Leu Arg Leu Ser Val 705 710 715 720 Ala Gly Asp Pro AlaThr Asn Gly Phe Gln Pro Gly Leu Ala Trp Lys 725 730 735 Ala Phe Val AspGly Lys Pro Ser Gln Asn Val Ser Ala Leu Tyr Thr 740 745 750 Leu Ser GlyGln Gly Ser Asn His Asn Phe Phe Ala Asn Glu Leu Ser 755 760 765 Gln PheVal Leu Pro Glu Thr Asn Asp Thr Leu Gly Thr Thr Leu Leu 770 775 780 PheSer Leu Val Ser Leu Lys Pro Thr Leu Leu Arg Val Asp Asp Met 785 790 795800 Ala Glu Val Thr Gln Thr Gly Gln Ala Val Thr Ser Val Lys Ala Pro 805810 815 Thr Gln Ile Tyr Phe Val Pro Lys Pro Glu Leu Arg Ser Leu Phe Ser820 825 830 Ser Ala Ala His Asp Phe Arg Ser Asp Leu Thr Ser Leu Thr AlaGly 835 840 845 Thr Lys Leu Tyr Asp Val Tyr Ala Thr Ser Met Glu Ile LysThr Ser 850 855 860 Ile Leu Pro Ser Thr Asn Arg Ser Tyr Ala Gln Gln ArgArg Asn Ser 865 870 875 880 Ala Val Lys Ile Gly Glu Met Glu Leu Thr SerPro Phe Ile Ala Ser 885 890 895 Ala Phe Gly Asp Asn Gly Val Phe Phe LysHis Gln Arg His Glu Asp 900 905 910 Lys 525 base pairs nucleic acidsingle linear unknown 18 ATGCAGCGGG GGCGCGGTTT CACTCTGATC GAGCTGCTGGTGGTGCTGGT GCTGCTGGGC 60 GTGCTCACCG GCCTCGCCGT GCTCGGCAGC GGGATCGCCAGCAGCCCCGC GCGCAAGCTG 120 GCGGACGAGG CCGAGCGCCT GCAGTCGCTG CTGCGGGTGCTGCTCGACGA GGCGGTGCTG 180 GACAACCGCG AGTATGGCGT ACGCTTCGAC GCCCGGAGCTACCGGGTGCT GCGCTTCGAG 240 CCGCGCACGG CGCGCTGGGA GCCGCTCGAC GAGCGCGTGCACGAGCTGCC GGAGTGGCTC 300 GAGCTGGAGA TCGAGGTCGA CGAGCAGAGT GTCGGGCTGCCCGCCGCCCG TGGCGAGCAG 360 GACAAAGCCG CGGCCAAGGC GCCACAGCTG CTGCTGCTCTCCAGTGGCGA GCTGACCCCC 420 TTCGCCCTGC GCCTGTCCGC CGGCCGCGAG CGCGGCGCGCCGGTGCTGAC GCTGGCCAGC 480 GACGGCTTCG CCGAGCCCGA GCTGCAGCAG GAAAAGTCCCGATGA 525 174 amino acids amino acid single linear unknown 19 Met GlnArg Gly Arg Gly Phe Thr Leu Ile Glu Leu Leu Val Val Leu 1 5 10 15 ValLeu Leu Gly Val Leu Thr Gly Leu Ala Val Leu Gly Ser Gly Ile 20 25 30 AlaSer Ser Pro Ala Arg Lys Leu Ala Asp Glu Ala Glu Arg Leu Gln 35 40 45 SerLeu Leu Arg Val Leu Leu Asp Glu Ala Val Leu Asp Asn Arg Glu 50 55 60 TyrGly Val Arg Phe Asp Ala Arg Ser Tyr Arg Val Leu Arg Phe Glu 65 70 75 80Pro Arg Thr Ala Arg Trp Glu Pro Leu Asp Glu Arg Val His Glu Leu 85 90 95Pro Glu Trp Leu Glu Leu Glu Ile Glu Val Asp Glu Gln Ser Val Gly 100 105110 Leu Pro Ala Ala Arg Gly Glu Gln Asp Lys Ala Ala Ala Lys Ala Pro 115120 125 Gln Leu Leu Leu Leu Ser Ser Gly Glu Leu Thr Pro Phe Ala Leu Arg130 135 140 Leu Ser Ala Gly Arg Glu Arg Gly Ala Pro Val Leu Thr Leu AlaSer 145 150 155 160 Asp Gly Phe Ala Glu Pro Glu Leu Gln Gln Glu Lys SerArg 165 170 390 base pairs nucleic acid single linear unknown 20ATGAAGCGCG GCCGCGGCTT CACCCTGCTC GAGGTGCTGG TGGCCCTGGC GATCTTCGCC 60GTGGTCGCCG CCAGCGTGCT CAGCGCCAGC GCTCGCTCGC TGAAGACCGC CGCGCGCCTG 120GAGGACAAGA CCTTCGCCAC CTGGCTGGCG GACAACCGCC TGCAGGAGCT GCAGCTGGCC 180GACGTGCCGC CGGGCGAGGG CCGCGAGCAG GGCGAGGAGA GCTACGCCGG GCGGCGCTGG 240CTGTGGCAGA GCGAGGTGCA GGCCACCAGC GAGCCGGAGA TGCTGCGTGT CACCGTACGG 300GTGGCGCTGC GGCCGGAGCG CGGGCTGCAG GGCAAGATCG AAGACCATGC CCTGGTGACC 360CTGAGTGGCT TCGTCGGGGT CGAGCCATGA 390 129 amino acids amino acid singlelinear unknown 21 Met Lys Arg Gly Arg Gly Phe Thr Leu Leu Glu Val LeuVal Ala Leu 1 5 10 15 Ala Ile Phe Ala Val Val Ala Ala Ser Val Leu SerAla Ser Ala Arg 20 25 30 Ser Leu Lys Thr Ala Ala Arg Leu Glu Asp Lys ThrPhe Ala Thr Trp 35 40 45 Leu Ala Asp Asn Arg Leu Gln Glu Leu Gln Leu AlaAsp Val Pro Pro 50 55 60 Gly Glu Gly Arg Glu Gln Gly Glu Glu Ser Tyr AlaGly Arg Arg Trp 65 70 75 80 Leu Trp Gln Ser Glu Val Gln Ala Thr Ser GluPro Glu Met Leu Arg 85 90 95 Val Thr Val Arg Val Ala Leu Arg Pro Glu ArgGly Leu Gln Gly Lys 100 105 110 Ile Glu Asp His Ala Leu Val Thr Leu SerGly Phe Val Gly Val Glu 115 120 125 Pro 684 base pairs nucleic acidsingle linear unknown 22 ATGAGGCAGC GCGGCTTCAC CCTGCTGGAA GTGCTGATCGCCATCGCCAT CTTCGCCCTG 60 CTGGCCATGG CCACCTACCG CATGCTCGAC AGCGTGCTGCAGACCGATCG TGGCCAGCGC 120 CAGCAGGAGC AGCGTCTGCG CGAGCTGACG CGGGCCATGGCAGCTTTCGA ACGCGACCTG 180 CTGCAGGTGC GCCTGCGTCC GGTGCGCGAC CCGCTGGGCGACCTGCTGCC AGCCCTGCGC 240 GGCAGCAGTG GCCGCGACAC CCAGCTGGAG TTCACCCGCAGCGGCTGGCG CAACCCGCTC 300 GGCCAGCCGC GCGCCACCCT ACAGCGGGTG CGCTGGCAGCTCGAAGGCGA GCGCTGGCAG 360 CGCGCTTACT GGACGGTGCT GGACCAGGCC CAGGACAGCCAGCCGCGGGT GCAGCAGGCG 420 CTGGATGGCG TGCGCCGCTT CGACTTGCGC TTTCTCGACCAGGAGGGGCG CTGGCTGCAG 480 GACTGGCCGC CGGCCAACAG TGCTGCCGAC GAGGCCCTGACCCAGCTGCC GCGTGCCGTC 540 GAGCTGGTCG TCGAGCACCG CCATTACGGT GAACTGCGCCGTCTCTGGCG CTTGCCCGAG 600 ATGCCGCAGC AGGAACAGAT CACGCCGCCC GGGGGCGAGCAGGGCGGTGA GCTGCTGCCG 660 GAAGAGCCGG AGCCCGAGGC ATGA 684 227 amino acidsamino acid single linear unknown 23 Met Arg Gln Arg Gly Phe Thr Leu LeuGlu Val Leu Ile Ala Ile Ala 1 5 10 15 Ile Phe Ala Leu Leu Ala Met AlaThr Tyr Arg Met Leu Asp Ser Val 20 25 30 Leu Gln Thr Asp Arg Gly Gln ArgGln Gln Glu Gln Arg Leu Arg Glu 35 40 45 Leu Thr Arg Ala Met Ala Ala PheGlu Arg Asp Leu Leu Gln Val Arg 50 55 60 Leu Arg Pro Val Arg Asp Pro LeuGly Asp Leu Leu Pro Ala Leu Arg 65 70 75 80 Gly Ser Ser Gly Arg Asp ThrGln Leu Glu Phe Thr Arg Ser Gly Trp 85 90 95 Arg Asn Pro Leu Gly Gln ProArg Ala Thr Leu Gln Arg Val Arg Trp 100 105 110 Gln Leu Glu Gly Glu ArgTrp Gln Arg Ala Tyr Trp Thr Val Leu Asp 115 120 125 Gln Ala Gln Asp SerGln Pro Arg Val Gln Gln Ala Leu Asp Gly Val 130 135 140 Arg Arg Phe AspLeu Arg Phe Leu Asp Gln Glu Gly Arg Trp Leu Gln 145 150 155 160 Asp TrpPro Pro Ala Asn Ser Ala Ala Asp Glu Ala Leu Thr Gln Leu 165 170 175 ProArg Ala Val Glu Leu Val Val Glu His Arg His Tyr Gly Glu Leu 180 185 190Arg Arg Leu Trp Arg Leu Pro Glu Met Pro Gln Gln Glu Gln Ile Thr 195 200205 Pro Pro Gly Gly Glu Gln Gly Gly Glu Leu Leu Pro Glu Glu Pro Glu 210215 220 Pro Glu Ala 225 954 base pairs nucleic acid single linearunknown 24 ATGAGCCGGC AGCGCGGCGT GGCACTGATC ACCGTGCTGC TGGTGGTGGCGCTGGTGACC 60 GTGGTCTGCG CGGCCCTGCT GCTGCGCCAG CAGCTGGCCA TCCGCAGCACCGGCAACCAG 120 CTGCTGGTGC GCCAGGCCCA GTACTACGCC GAAGGCGGCG AGCTGCTGGCCAAGGCCCTG 180 CTGCGTCGCG ACCTGGCCGC CGACCAGGTC GATCATCCCG GCGAGCCCTGGGCCAACCCC 240 GGCCTGCGCT TCCCCCTGGA TGAGGGCGGC GAGCTGCGCC TGCGCATCGAGGACCTGGCC 300 GGACGTTTCA ACCTCAACAG CCTGGCCGCC GGTGGTGAGG CCGGTGAGTTGGCGCTGCTG 360 CGCCTGCGGC GCCTGCTGCA GCTGCTGCAG CTGACCCCGG CCTATGCCGAGCGCCTGCAG 420 GACTGGCTCG ACGGCGATCA GGAGGCCAGC GGCATGGCCG GCGCCGAGGATGACCAGTAC 480 CTGCTGCAGA AACCGCCCTA CCGTACCGGC CCCGGGCGCA TTGCCGAGGTGTCGGAGCTG 540 CGCCTGCTGC TGGGCATGAG CGAGGCCGAC TACCGCCGCC TGGCCCCCTTCGTCAGCGCC 600 CTGCCGAGCC AGGTCGAGCT GAACATCAAC ACCGCCAGCG CCCTGGTGCTGGCTTGCCTG 660 GGCGAGGGCA TNCCCGAGGC GGTGCTCGAG GCCGCCATCG ANGGTCGCGGCCGCAGCGGC 720 TATCGCGAGC CCGCTGCCTT CGTCCAGCAN CTTGCCAGCT ACGGCGTCAGCCCGCAGGGG 780 CTGGGCATCG CCAGCCAGTA TTTCCGTGTC ACCACCGAGG TGCTGCTGGGTGAGCGGCGC 840 CAGGTGCTGG CCAGTTATCT GCAACGTGGT AATGATGGGC GCGTCCGCCTGATGGCGCGC 900 GATCTGGGGC AGGAGGGCCT GGCGCCCCCA CCCGTCGAGG AGTCCGAGAAATGA 954 317 amino acids amino acid single linear unknown 25 Met Ser ArgGln Arg Gly Val Ala Leu Ile Thr Val Leu Leu Val Val 1 5 10 15 Ala LeuVal Thr Val Val Cys Ala Ala Leu Leu Leu Arg Gln Gln Leu 20 25 30 Ala IleArg Ser Thr Gly Asn Gln Leu Leu Val Arg Gln Ala Gln Tyr 35 40 45 Tyr AlaGlu Gly Gly Glu Leu Leu Ala Lys Ala Leu Leu Arg Arg Asp 50 55 60 Leu AlaAla Asp Gln Val Asp His Pro Gly Glu Pro Trp Ala Asn Pro 65 70 75 80 GlyLeu Arg Phe Pro Leu Asp Glu Gly Gly Glu Leu Arg Leu Arg Ile 85 90 95 GluAsp Leu Ala Gly Arg Phe Asn Leu Asn Ser Leu Ala Ala Gly Gly 100 105 110Glu Ala Gly Glu Leu Ala Leu Leu Arg Leu Arg Arg Leu Leu Gln Leu 115 120125 Leu Gln Leu Thr Pro Ala Tyr Ala Glu Arg Leu Gln Asp Trp Leu Asp 130135 140 Gly Asp Gln Glu Ala Ser Gly Met Ala Gly Ala Glu Asp Asp Gln Tyr145 150 155 160 Leu Leu Gln Lys Pro Pro Tyr Arg Thr Gly Pro Gly Arg IleAla Glu 165 170 175 Val Ser Glu Leu Arg Leu Leu Leu Gly Met Ser Glu AlaAsp Tyr Arg 180 185 190 Arg Leu Ala Pro Phe Val Ser Ala Leu Pro Ser GlnVal Glu Leu Asn 195 200 205 Ile Asn Thr Ala Ser Ala Leu Val Leu Ala CysLeu Gly Glu Gly Xaa 210 215 220 Pro Glu Ala Val Leu Glu Ala Ala Ile XaaGly Arg Gly Arg Ser Gly 225 230 235 240 Tyr Arg Glu Pro Ala Ala Phe ValGln Xaa Leu Ala Ser Tyr Gly Val 245 250 255 Ser Pro Gln Gly Leu Gly IleAla Ser Gln Tyr Phe Arg Val Thr Thr 260 265 270 Glu Val Leu Leu Gly GluArg Arg Gln Val Leu Ala Ser Tyr Leu Gln 275 280 285 Arg Gly Asn Asp GlyArg Val Arg Leu Met Ala Arg Asp Leu Gly Gln 290 295 300 Glu Gly Leu AlaPro Pro Pro Val Glu Glu Ser Glu Lys 305 310 315 1146 base pairs nucleicacid single linear unknown 26 ATGAGTCTGC TCACCCTGTT TCTGCCGCCCCAGGCCTGCA CCGAGGCGAG CGCCGACATG 60 CCGGTGTGGT GCGTCGAGAG CGACAGCTGCCGTCAGCTGC CCTTCGCCGA GGCCTTGCCG 120 GCCGACGCGC GGGTCTGGCG CTTGGTGCTGCCGGTGGAGG CGGTGACCAC CTGTGTCGTG 180 CAGTTGCCGA CCACCAAGGC ACGCTGGCTGGCCAAGGCCC TGCCGTTCGC CGTCGAGGAG 240 CTGCTGGCCG AGGAGGTGGA GCAGTTTCACCTGTGCGTCG GTAGCGCGCT GGTCGATGGT 300 CGTCATCGTG TTCATGCCCT GCGCCGCGAGTGGCTGGCCG GCTGGCTGGC GCTGTGCGGC 360 GAGCGGCCGC CGCAGTGGAT CGAGGTGGACGCCGACCTGT TGCCGGAGGA GGGTAGCCAG 420 CTGCTCTGCC TGGGCGAGCG CTGGTTGCTCGGCGGGTCGG GCGAGGCGCG CCTGGCCCTG 480 CGTGGCGAGG ACTGGCCGCA GCTGGCGGCGCTCTGTCCGC CGCCCCGGCA AGCCTATGTG 540 CCGCCCGGGC AGGCGGCGCC GCCGGGCGTCGAGGCCTGCC AGACGCTGGA GCAGCCGTGG 600 CTCTGGCTGG CCGCGCAGAA GTCCGGCTGCAACCTGGCCC AGGGGCCTTT CGCCCGTCGC 660 GAGCCTTCCG GCCAGTGGCA GCGCTGGCGGCCGCTGGCGG GGCTGCTCGG TCTCTGGCTG 720 GTGCTGCAKT GGGGCTTCAA CCTTGCCCANGGCTGGCAGC TGCAGCGCGA GGGTGAACGC 780 TATGCCGTGG CCAACGAGGC GCTGTATCGCGAGCTGTTCC CCGAGGATCG CAAGGTGATC 840 AACCTGCGTG CGCAGTTCGA CCAGCACCTGGCCGAGGCGG CTGGGAGCGG CCAGAGCCAG 900 TTGCTGGCCC TGCTCGATCA GGCCGCCGCGGCCATCGGCG AAGGGGGGGC GCAGGTGCAG 960 GTGGATCAGC TCGACTTCAA CGCCCAGCGTGGCGACCTGG CCTTCAACCT GCGTGCCAGC 1020 GACTTCGCCG CGCTGGAAAG CCTGCGGGCGCGCCTGCAGG AGGCCGGCCT GGCGGTGGAC 1080 ATGGGCTCGG CGAGCCGCGA GGACAACGGCGTCAGTGCGC GCCTGGTGAT CGGGGGTAAC 1140 GGATGA 1146 381 amino acids aminoacid single linear unknown 27 Met Ser Leu Leu Thr Leu Phe Leu Pro ProGln Ala Cys Thr Glu Ala 1 5 10 15 Ser Ala Asp Met Pro Val Trp Cys ValGlu Ser Asp Ser Cys Arg Gln 20 25 30 Leu Pro Phe Ala Glu Ala Leu Pro AlaAsp Ala Arg Val Trp Arg Leu 35 40 45 Val Leu Pro Val Glu Ala Val Thr ThrCys Val Val Gln Leu Pro Thr 50 55 60 Thr Lys Ala Arg Trp Leu Ala Lys AlaLeu Pro Phe Ala Val Glu Glu 65 70 75 80 Leu Leu Ala Glu Glu Val Glu GlnPhe His Leu Cys Val Gly Ser Ala 85 90 95 Leu Val Asp Gly Arg His Arg ValHis Ala Leu Arg Arg Glu Trp Leu 100 105 110 Ala Gly Trp Leu Ala Leu CysGly Glu Arg Pro Pro Gln Trp Ile Glu 115 120 125 Val Asp Ala Asp Leu LeuPro Glu Glu Gly Ser Gln Leu Leu Cys Leu 130 135 140 Gly Glu Arg Trp LeuLeu Gly Gly Ser Gly Glu Ala Arg Leu Ala Leu 145 150 155 160 Arg Gly GluAsp Trp Pro Gln Leu Ala Ala Leu Cys Pro Pro Pro Arg 165 170 175 Gln AlaTyr Val Pro Pro Gly Gln Ala Ala Pro Pro Gly Val Glu Ala 180 185 190 CysGln Thr Leu Glu Gln Pro Trp Leu Trp Leu Ala Ala Gln Lys Ser 195 200 205Gly Cys Asn Leu Ala Gln Gly Pro Phe Ala Arg Arg Glu Pro Ser Gly 210 215220 Gln Trp Gln Arg Trp Arg Pro Leu Ala Gly Leu Leu Gly Leu Trp Leu 225230 235 240 Val Leu Xaa Trp Gly Phe Asn Leu Ala Xaa Gly Trp Gln Leu GlnArg 245 250 255 Glu Gly Glu Arg Tyr Ala Val Ala Asn Glu Ala Leu Tyr ArgGlu Leu 260 265 270 Phe Pro Glu Asp Arg Lys Val Ile Asn Leu Arg Ala GlnPhe Asp Gln 275 280 285 His Leu Ala Glu Ala Ala Gly Ser Gly Gln Ser GlnLeu Leu Ala Leu 290 295 300 Leu Asp Gln Ala Ala Ala Ala Ile Gly Glu GlyGly Ala Gln Val Gln 305 310 315 320 Val Asp Gln Leu Asp Phe Asn Ala GlnArg Gly Asp Leu Ala Phe Asn 325 330 335 Leu Arg Ala Ser Asp Phe Ala AlaLeu Glu Ser Leu Arg Ala Arg Leu 340 345 350 Gln Glu Ala Gly Leu Ala ValAsp Met Gly Ser Ala Ser Arg Glu Asp 355 360 365 Asn Gly Val Ser Ala ArgLeu Val Ile Gly Gly Asn Gly 370 375 380 4377 base pairs nucleic acidsingle linear unknown 28 GAATTCGCCG CCGAGCTGGC CAAGCCGCTG GGCGCGGTGACCGCACAGAA GGAAGTGGAG 60 CGTGCCCTGC GCGACCTGCA CCTGCCCTTC GACGAGCGCCGTCCCTACGC CCTGCGCCGT 120 CTGCGCGACC GCATCGAGGC CAATCTCTCC GGCCTGATGGGCCCCAGCGT GGCCCAGGAC 180 ATGGTGGAAA CCTTCCTGCC CTACAAGGCC GGCAGCGAGGCCTATGTCAG CGAAGACATC 240 CACTTCATCG AGAGTCGCCT GGAGGATTAC CAGTCGCGCCTCACCGGCCT GGCCGCCGAG 300 CTCGACGCGC TGCGCCGCTT CCACCGCCAG ACCCTGCAGGAACTGCCGAT GGGCGTATGT 360 TCGCTGGCCA AGGACCAGGA AGTGCTGATG TGGAACCGCGCCATGGAGGA ACTCACCGGC 420 ATCAGCGCGC AGCAGGTGGT CGGCTCGCGC CTGCTCAGCCTGGAGCACCC CTGGCGCGAG 480 CTGCTGCAGG ACTTCATCGC CCAGGACGAG GAGCACCTGCACAAGCAGCA CCTGCAACTG 540 GACGGCGAGG TGCGCTGGCT CAACCTGCAC AAGGCGGCCATCGACGAACC GCTGGCGCCG 600 GGCAACAGCG GCCTGGTGCT GCTGGTCGAG GACGTCACCGAGACCCGCGT GCTGGAAGAC 660 CAGCTGGTGC ACTCCGAGCG TCTGGCCAGC ATCGGCCGCCTGGCCGCCGG GGTGGCCCAC 720 GAGATCGGCA ATCCGGTCAC CGGCATCGCC TGCCTGGCGCAGAACCTGCG CGAGGAGCGC 780 GAGGGCGACG AGGAGCTCGG CGAGATCAGC AACCAGATCCTCGACCAGAC CAAGCGCATC 840 TCGCGCATCG TCCAGTCGCT GATGAACTTC GCCCACGCCGGCCAGCAGCA GCGCGCCGAA 900 TACCCGGTGA GCCTGGCCGA AGTGGCGCAG GACGCCATCGGCCTGCTGTC GCTGAACCGC 960 CATGGCACCG AAGTGCAGTT CTACAACCTG TGCGATCCCGAGCACCTGGC CAAGGGCGAC 1020 CCGCAGCGCC TGGCCCAGGT GCTGATCAAC CTGCTGTCCAACGCCCGCGA TGCCTCGCCG 1080 GCCGGCGGTG CCATCCGCGT GCGTAGCGAG GCCGAGGAGCAGAGCGTGGT GCTGATCGTC 1140 GAGGACGAGG GCACGGGCAT TCCGCAGGCG ATCATGGACCGCCTGTTCGA ACCCTTCTTC 1200 ACCACCAAGG ACCCCGGCAA GGGCACCGGT TTGGGGCTCGCGCTGGTCTA TTCGATCGTG 1260 GAAGAGCATT ATGGGCAGAT CACCATCGAC AGCCCGGCCGATCCCGAGCA CCAGCGCGGA 1320 ACCCGTTTCC GCGTGACCCT GCCGCGCTAT GTCGAAGCGACGTCCACAGC GACCTGAGTA 1380 GTGACCTAGA ACCGCCGAGG GGCCACAAGC CCGGCGGATTCGGAGACCGT CGAGAGAACA 1440 CAATGCCGCA TATCCTCATC GTCGAAGACG AAACCATCATCCGCTCCGCC CTGCGCCGCC 1500 TGCTGGAACG CAACCAGTAC CAGGTCAGCG AGGCCGGTTCGGTTCAGGAG GCCCAGGAGC 1560 GCTACAGCAT TCCGACCTTC GACCTGGTGG TCAGCGACCTGCGCCTGCCC GGCGCCCCCG 1620 GCACCGAGCT GATCAAGCTG GCCGACGGCA CCCCGGTACTGATCATGACC AGCTATGCCA 1680 GCCTGCGCTC GGCGGTGGAC TCGATGAAGA TGGGCGCGGTGGACTACATC GCCAAGCCCT 1740 TCGATCACGA CGAGATGCTC CAGGCCGTGG CGCGTATCCTGCGCGATCAC CAGGAGGCCA 1800 AGCGCAACCC GCCAAGCGAG GCGCCCAGCA AGTCCGCCGGCAAGGGCAAC GGCGCCACCG 1860 CCGAGGGCGA GATCGGCATC ATCGGCTCCT GCGCCGCCATGCAGGACCTT TACGGCAAGA 1920 TCCGCAAGGT CGCTCCCACC GATTCCAACG TACTGATCCAGGGCGAGTCC GGCACCGGCA 1980 AGGAGCTGGT CGCGCGTGCG CTGCACAACC TCTCGCGTCGCGCCAAGGCA CCGCTGATCT 2040 CGGTGAACTG CGCGGCCATC CCCGAGACCC TGATCGAGTCCGAACTGTTC GGCCACGAGA 2100 AAGGTGCCTT CACCGGCGCC AGCGCCGGCC GCGCCGGCCTGGTCGAAGCG GCCGACGGCG 2160 GCACCCTGTT CCTCGACGAG ATCGGCGAGC TGCCGCTGGAGGCGCAGGCC CGCCTGCTGC 2220 GCGTGCTGCA GGAGGGCGAG ATCCGTCGGG TCGGCTCGGTGCAGTCACAG AAGGTCGATG 2280 TACGCCTGAT CGCCGCTACC CACCGCGACC TCAAGACGCTGGCCAAGACC GGCCAGTTCC 2340 GCGAGGACCT CTACTACCGC CTGCACGTCA TCGCCCTCAAGCTGCCGCCA CTGCGCGAGC 2400 GCGGCGCCGA CGTCAACGAG ATCGCCCGCG CCTTCCTCGTCCGCCAGTGC CAGCGCATGG 2460 GCCGCGAGGA CCTGCGCTTC GCTCAGGATG CCGAGCAGGCGATCCGCCAC TACCCCTGGC 2520 CGGGCAACGT GCGCGAGCTG GAGAATGCCA TCGAGCGCGCGGTGATCCTC TGCGAGGGCG 2580 CGGAAATTTC CGCCGAGCTG CTGGGCATCG ACATCGAGCTGGACGACCTG GAGGACGGCG 2640 ACTTCGGCGA ACAGCCACAG CAGACCGCGG CCAACCACGAACCGACCGAG GACCTGTCGC 2700 TGGAGGACTA CTTCCAGCAC TTCGTACTGG AGCACCAGGATCACATGACC GAGACCGAAC 2760 TGGCGCGCAA GCTCGGCATC AGCCGCAAGT GCCTGTGGGAGCGCCGTCAG CGCCTGGGCA 2820 TTCCGCGGCG CAAGTCGGGC GCGGCGACCG GCTCCTGAACGGGACGAACG GTGACAGGCC 2880 TCGCCGCAAA AGGTTCCGCG CCTGTTACCC CGCACAAATATCGCGTAACA AAAGCCGGGT 2940 TCATCGGTAA CGGGAACCCG GCTTTTTTCT GCCCGCCGCCCGCACCAAAA AATCATAACT 3000 CATTGAAAAA CAAGGAATTA CAAAAACTGG CACGGCTTCTGCTTTATCTC TGGCACAACA 3060 ACAATAACAA CGCTCGAAAC CTCAACAATA AAAACAATACAGAACGACTC CAGCACAACA 3120 AAAACAACAA CGCGGAGGCG CAGCTAACTG ATTCTTTTGGAGAGGATTTG CCCTTGGGGT 3180 TCGCCCCACA ACCAGGCCGA GAACAACAAA AACTGCACTAAAGCAGCGCC TGCACTGGTT 3240 GGGTCATGGA ATGATCAAGG CAGCATCAGC ATCCAAAGCAATCCGTTTGC TCCTGGTACC 3300 CGATTTGGGC TACCTGAAAC GGGCCTACAA CAAAAACAACAGGCCCGCAC AATAATAAAA 3360 ACAAAGCACG CACCTATTTG GGGGGGAGCT TCGGCTCCCCCAGTAGCTTC ACCCCACCTC 3420 GCGTTCCCCA GCCTGCCTTT TCCACCATCC CCCTTCCCGATGCTAGAATC CGCGCCAATC 3480 CTGCGGCGAT CTGCAATTGT GGCCGCCTAT TCCTGCAAACAGTGCATCCC ATGCTGAAAA 3540 AGCTGTTCAA GTCGTTTCGT TCACCTCTCA AGCGCCAAGCACGCCCCCGC AGCACGCCGG 3600 AAGTTCTCGG CCCGCGCCAG CATTCCCTGC AACGCAGCCAGTTCAGCCGC AATGCGGTAA 3660 ACGTGGTGGA GCGCCTGCAG AACGCCGGCT ACCAGGCCTATCTGGTCGGC GGCTGCGTAC 3720 GCGACCTGCT GATCGGCGTG CAGCCCAAGG ACTTCGACGTGGCCACCAGC GCCACCCCCG 3780 AGCAGGTGCG GGCCGAGTTT CGCAACGCCC GGGTGATCGGCCGCCGCTTC AAGCTGGCGC 3840 ATGTGCATTT CGGCCGCGAG ATCATCGAGG TGGCGACCTTCCACAGCAAC CACCCGCAGG 3900 GCGACGACGA GGAAGACAGC CACCAGTCGG CCCGTAACGAGAGCGGGCGC ATCCTGCGCG 3960 ACAACGTCTA CGGCAGTCAG GAGAGCGATG CCCAGCGCCGCGACTTCACC ATCAACGCCC 4020 TGTACTTCGA CGTCAGCGGC GAGCGCGTGC TGGACTATGCCCACGGCGTG CACGACATCC 4080 GCAACCGCCT GATCCGCCTG ATCGGCGACC CCGAGCAGCGCTACCTGGAA GACCCGGTAC 4140 GCATGCTGCG CGCCGTACGC TTCGCCGCCA AGCTGGACTTCGACATCGAG AAACACAGCG 4200 CCGCGCCGAT CCGCCGCCTG GCGCCGATGC TGCGCGACATCCCTGCCGCG CGCCTGTTCG 4260 ACGAGGTGCT CAAGCTGTTC CTCGCCGGCT ACGCCGAGCGCACCTTCGAA CTGCTGCTCG 4320 AGTACGACCT GTTCGCCCCG CTGTTCCCGG CCAGCGCCCGCGCCCTGGAG CGCGATC 4377 17612 base pairs nucleic acid single linearunknown 29 GATCTCGAGG GCGTCGGCTT CGACACCCTG GCGGTGCGCG CCGGTCAGCATCGCACGCCG 60 GAGGGCGAGC ATGGCGAGGC CATGTTCCTC ACCTCCAGCT ATGTGTTCCGCAGCGCCGCC 120 GACGCCGCCG CGCGCTTCGC CGGCGAGCAG CCGGGCAACG TCTACTCGCGCTACACCAAC 180 CCGACCGTGC GCGCCTTCGA GGAGCGCATC GCCGCCCTGG AAGGCGCCGAGCAGGCGGTG 240 GCCACCGCCT CCGGCATGGC CGCCATCCTG GCCATCGTCA TGAGCCTGTGCAGCGCCGGC 300 GACCATGTGC TGGTGTCGCG CAGCGTGTTC GGCTCGACCA TCAGCCTGTTCGAGAAGTAC 360 CTCAAGCGCT TCGGCATCGA GGTGGACTAC CCGCCGCTGG CCGATCTGGACGCCTGGCAG 420 GCAGCCTTCA AGCCCAACAC CAAGCTGCTG TTCGTCGAAT CGCCGTCCAACCCGTTGGCC 480 GAGCTGGTGG ACATAGGCGC CCTGGCCGAG ATCGCCCACG CCCGCGGCGCCCTGCTGGCG 540 GTGGACAACT GCTTCTGCAC CCCGGCCCTG CAGCAGCCGC TGGCGCTGGGCGCCGATATG 600 GTCATGCATT CGGCGACCAA GTTCATCGAT GGCCAGGGCC GCGGCCTGGGCGGCGTGGTG 660 GCCGGGCGCC GTGCGCAGAT GGAGCAGGTG GTCGGCTTCC TGCGCACCGCCGGGCCGACC 720 CTCAGCCCGT TCAACGCCTG GATGTTCCTC AAGGGCCTGG AGACCCTGCGTATCCGCATG 780 CAGGCGCAGA GCGCCAGCGC CCTGGAACTG GCCCGCTGGT TGGAGACCCAGCCGGGCATC 840 GACAGGGTCT ACTATGCCGG CCTGCCCAGC CACCCGCAGC ACGAGCTGGCCAAGCGGCAG 900 CAGAGTGCCT TCGGCGCGGT GCTGAGCTTC GAGGTCAAGG GCGGCAAGGAGGCGGCCTGG 960 CGTTTCATCG ATGCCACCCG GGTGATCTCC ATCACCACCA ACCTGGGCGATACCAAGACC 1020 ACCATCGCCC ATCCGGCGAC CACCTCCCAC GGTCGTCTGT CGCCGCAGGAGCGCGCCAGC 1080 GCCGGTATCC GCGACAACCT GGTGCGTGTC GCCGTGGGCC TGGAAGACGTGGTCGACCTC 1140 AAGGCCGACC TGGCCCGTGG CCTGGCCGCG CTCTGAGGAC GGGGGCCCCCGTTCCTGCCG 1200 CGAAGGGCAG GGGCGGGGGC TTGCGGCGGG CCTTTGCGCG ATCAGCAGCTAGTCTTGGGG 1260 AAACGTCCTA GCCCAGGAGC TACCCCATGA ACCTCATCCT TTTCCTGATCATCGGCGCCG 1320 TTGCCGGCTG GATCGCCGGC AAGTTGCTGC GTGGTGGCGG CTTCGGGCTGATCGGCAACC 1380 TGGTGGTGGG CATAGTGGGC GCGGTGATCG GCGGCCACCT GTTCAGCTACCTGGGCGTGT 1440 CCGCCGGTGG TGGGCTGATC GGCTCGCTGG TGACCGCGGT GATCGGTGCCCTGGTCCTGC 1500 TGTTCATCGT CGGCCTGATC AAGAAGGCCC AGTAGCGCTG GCGGGACGCCGTCCCGCCGC 1560 CCATCACTGG TCGCGCAGGT CCACGGCACC GGCGCCGGGT TTGTCGAACAGGCGCTCGGC 1620 GCTGCCCGGC AGGCTGCTGT GGCCATCCTC GTCGGCACCC AGCACGCTGATGTCGCTGTA 1680 CTTCTTGCCC GACAGCGCGG CCATGCCGGC GCGGTCGCGG ACGATGGTCGGGCGCAGGAA 1740 CACCATCAGG TTGCGCTTGA CGTGGGTGTC CTTGGTCGAG CGGAACAGCCGGCCGATCAG 1800 CGGGATGTCA CCCAGCAGCG GCACCTTGGA GTCGGTGCTG GTGACGTCGTCCTGGATCAG 1860 CCCTCCCAGC ACTATGACCT GGCCGTCGTC GGCCAGGATC ACGCTCTTGATCGAGCGCTT 1920 GTTGGTCACC AGGTCCACCG CCTGGGCATT GACCCCGGCG CTGGGGGCGATGGAGGAGAT 1980 CTCCTGCTCC ACTTCCAGGC GCAGGGTGGC GCCGTCGTTG ATGTGCGGGGTGACCTTGAG 2040 GGTCACGCCG ATGTCCTCGC GCTCAATGGT GGTGAAGGGG TTGTTCGCCCCCGAGGCGTC 2100 GGTGGTGTAG GAGCCGGTCT GGAAAGGCAC GTTCTGCCCG ACCAGGATTTCCGCCTCCTG 2160 GTTGTCCAGG GTCAGCAGGC TGGGCGTGGA CAGCAGGTTG CTCTTGCTGTTGGCAGAGAG 2220 GGCAGTGATC AGCGCGCCGA AGTTCTCGGT GCCGATGCCG ATGATGGCGCCGTCCGGCAG 2280 GGTCAGGTCA TCGGGGATTT CCTCGTTCTG GATGGCCTTG AGCACGGTGCCCACCGATAG 2340 CCCGGTATTG CCGAAGTTGA CCCCGCCGAG GCCGCCGGTG CCGCCGCGGGCATCCACCGC 2400 CCACTGCACG CCGAGGGCGT CGCTGATGTC CCCGGAGATT TCCACGATGGCCGCCTCGAC 2460 CATCACCTGG GCGCGCGGCA CGTCGAGGTT GCGCACGATT TCCTCGAGGGTCGCCACGGT 2520 GTCCGGATCG GCCAGCAGGA CCAGGGCATT GAGGCTCTCG TCGGCGCGGATCAGGATGTT 2580 CTGCGGCTTG CTGCTGGCGG CTTCGCCACC ACCCTCCGCG GTCTTCAACCCCTCGGAGAT 2640 GTCGCCCAGG GTCTCGGCCA GGCTCTTGGC GTCGCTGTGG CGTAGGCGAATTACCCGCGC 2700 ATTGGCCGAA CGGGTGCTGG GGATGTCCAG CGAGCGGGCC AGGTTGGCCAGGCGCTGGCG 2760 GGCGGCCGGC GGGCCGAGGA GGATCAGGCG GTTGGTGCGG GCGTCGGCAATCACCCGGGT 2820 GCCGGCGCTG TTTTTCTCGT TGCGCATCAC CGCGTTGTTC AGTGCCTCGGCGGCGTCCAG 2880 TACCCAGGCA TGCTGCAGGT TGATCACGTT GTAGTCGCCG CCGCCCTGGGCATCGAGCTC 2940 GGCGATCAGT TCGCGGATGC GTTCGATATT NGCCCGGCGG TCGCTGATGATCAGCGCGTT 3000 GGAGGCGGCG ACCGCCGCCA GGTGGCCGTT CTGCGGCACC AGCGGGCGGATCAGCGGGAT 3060 CAGTTCGTTG ACCGAGGTGT GCTGCACCTG GATCAGCTCG GTCTGCACATCGTCCGGCGC 3120 GCTGCGGCTG CTGTTGGCGC CGCTACGCGC CTCGGTGACC GGCACGATGCGCGCCTGGTC 3180 GCCCTGTGCC AGCACGCTGA AGCCATGGGT GCTCATCACC GAAAGGAACAGCTGGTAGAC 3240 CTCCTCGAGG CCCAGCGGGG TCTTGGAGAT CACCGTGACC TGGCCCTTGACCCGCGGATC 3300 GACGACGAAG GTCTCGCCAG AGATCTGCGC CACCTGGTCG ATGAAGTCGCGGATATCGGC 3360 GTCCTTCATG TTGATGGTCC AGGTCTCGGC GCCCTGGCTC ACCGCCACCGGCTCGGCGGC 3420 ATGGACGAGC GGCAGCGGGG CGGCGAGGCA GCTCGCGGCC AGCAGCAGGGCGAGGGGCAG 3480 GCGTTTGTGC GGCGGAATTC TGGAGTCGAT CATGGGCTGT CTTCGGCTTCCGGTATTTCG 3540 GGCTGCGGGA TGTCGCCGCC TTCCATGCGT TGTTGAAGGG TCTGGATGCGCTCCTGCAGG 3600 GCCTGGACGT CTTCGTCCTG CAGCTGTTCC AGTTGGCTGG CGGTGGGCTCCAGCGCCGAG 3660 TAGGCCGGCG TCAGAGAGGG CTGGCGCACG GCGGGGAAGC GCAGGCTCTCCTCGACGCCG 3720 CCGCGGTCGA GCACCACGTG GTCCTGATAG ACGGCCTGCA GGCGGGTGCTGACGTTGACC 3780 GATTCGCCCA CGGCGATGCG CTTGGGTTTG TCGCCGGCGA CCTGGATGATCGCCGTGGAG 3840 CGCTTGGCGT CCGGGTTGAC GAAGCTGGCC AGCAGGGTCA TCTGCTGCCGGGTGGCGGGG 3900 GCGGCCTGGT CGCCGCGCGG CCTGGCCGCG GGCGTGCCGA ACAGATGCTGCAGGCGCTGG 3960 ATGGACAGCG GCTGGCGCTC GGCGATGCTC TCTGGGGCGG GCGGTGGCGCGGCCTCGCTG 4020 CGCAGCAGGC GAAGGAAGTC GATGCTCTGC TTGCTCAGGC TGAGGGTGATGAGCAGCACC 4080 ACGAGCAGGC AGAGGCCGGT CACGCCGTGG CGCTGCAGCC AGGCGGGCAGGCGGGTGCGG 4140 GTGCTACTCA AGGCATGGTT CCCCCGGTGT TCTTCTTATT CTGTGCGGACGCTCTGCTCG 4200 GCGTCTCGCA ATCCGGCCCG TACTCTGCGG GCGCAGGCAA CCTTAACGCAAGTCTCCTGT 4260 CCATGGCGCA CCTGCTTCGT CTATCTGCGC GCTGGCGCAC TGTCCGCCGCTGCCGGAAGC 4320 GTGAAACATT TCGAAACTTT CGGCGAACGA GTCGCTATCA TCGGCCCCACGCGCTTCCCG 4380 TTCAACAATA GCAATAAGCC AGACGGATTA CCGCCATGGA AGATCGCAAGCCGCCTGCCG 4440 CGGCTCCCGT GGGGTTTGCG CGCGCGGAGC TGCTGGAGCT GCTCTGCCGCTGCGAGCAGT 4500 TTCCCCTGAC CCTGCTGCTG GCGCCCGCCG GTTCCGGCAA GTCGACCCTGCTGGCCCAGT 4560 GGCAGGCCAG CCGGCCCTTC GGCAGTGTGG TGCACTATCC ACTGCAGGCGCGTGACAACG 4620 AGCCGGTACG CTTCTTCCGC CACCTGGCCG AAAGCATCCG CGCCCAGGTCGAGGACTTCG 4680 ACCTGTCCTG GTTCAACCCC TTCGCCGCCG AGATGCACCA GGCGCCCGAGGTGCTCGGCG 4740 AGTACCTGGC CGACGCCCTC AATCGCATCG AGAGCCGCCT CTACCTCGTCCTCGACGACT 4800 TCCAGTGCAT CGGCCAGCCG ATCATCCTCG ACGTGCTCTC GGCCATGCTCGAACGCCTGG 4860 CGGGCAACAC CCGGGTCATT CTGTCCGGGC GCAACCATCC GGGGTTCTCCCTCAGCCGCC 4920 TGAAACTGGA CAACAAGCTG CTGTGCATCG ACCAGCACGA CATGCGCCTGTCGCCAGTGC 4980 AGATCCAACA CCTCAATGCC TACCTGGGCG GTCCCGAGCT CAGCCCGGCCTATGTCGGCA 5040 GCCTGATGGC CATGACCGAG GGCTGGATGG TCGGGGTGAA GATGGCCCTGATGGCCCATG 5100 CGCGCTTCGG CACCGAGGCC CTGCAGCGCT TCGGTGGCGG CCATCCGGAGATAGTCGACT 5160 ACTTCGGCCA TGTGGTGCTG AAGAAGCTGT CGCCGCAGCT GCACGACTTCCTGTTGTGCA 5220 GCGCGATCTT CGAGCGCTTC GACGGCGAGC TATGCGACCG GGTGCTGGATCGCAGCGGTT 5280 CGGCCCTGCT GCTGGAGGAC CTGGCCGCGC GCGAGCTGTT CATGCTGCCGGTGGACGAGT 5340 ATCCCGGCTG CTACCGCTAC CACGCCCTGT TGCACGATTT CCTCGCCCGGCGCCTGGCCG 5400 TGCACAAGCC ACAGGAAGTG GCGCAACTGC ACCGGCGGGC GGCCCTGGCGCTGCAGCAGC 5460 GTGGCGACCT GGAGCTGGCC CTGCAGCATG CCCAGCGCAG TGGCGACCGCGCGTTGTTCC 5520 AAAGCATGCT GGGCGAGGCC TGCGAGCAAT GGGTGCGCAG CGGTCACTTCGCCGAGGTGC 5580 TGAAGTGGCT GGAGCCGCTG AGCGAGGCGG AACTCTGCGN GCAGTCGCGCCTGCTGGTGC 5640 TGATGACCTA TGCCCTGACC CTGTCGCGGC GTTTCCACCA GGCGCGCTACTGCTTGGACG 5700 AACTGGTGGC GCGCTGCACC GGTCAGCCGG GCCTGGAGGA GCCGACCCGCCAGCTGCTGG 5760 CGCTCAACCT GGAGCTGTTC CAGCACGACC TGGCCTTCGA CCCCGGCCAGCGCTGGTCCG 5820 ACCTGCTGGC CGCGGGCGTC GCCTCGGACA TCCGTGCCCT GGCGCTGAGCATCCTCGCCT 5880 ATCACCACCT GATGCACGGC CGCCTGGAGC AGTCGATCCA GCTGGCGCTGGAGGCCAAGG 5940 CGCTGCTGGC CAGCACCGGC CAGCTGTTCC TGGAGAGCTA CGCCGACCTGATCATCGCCC 6000 TGTGCAACCG CAACGCCGGG CGCGCCACCA GCGCGCGCAA GGACGTCTGCCTGGATTACC 6060 AGCGCACCGA GCGCTCCTCG CCGGCCTGGG TCAACCGTGC CACCGCCATGGTGGTGGCGC 6120 TGTACGAGCA GAACCAGCTG GCCGCCGCCC AGCAGCTGTG CGAGGACCTGATGGCCATGG 6180 TCACGTCGTC CTCGGCCACC GAGACCATCG CCACCGTGCA CATCACCCTGTCGCGCCTGC 6240 TCCACCGGCG CCAGTCCCAG GGCCGCGCCA CGCGCCTGCT GGAGCAGCTGTCGCGCATCC 6300 TGCAACTGGG CAACTACGCC CGCTTCGCCA GCCAGGCGGC GCAGGAGAGCATGCGCCAGG 6360 CCTATCTCGA CGGGCGCCCG GCGGCGCTCG ACGCACTGGC CCAACGCCTGGGTATCGAGG 6420 AGCGCCTGGC CGCCGGGGAG TGGGAGAGGG TGCGGCCCTA TGAAGAGTGCTGGGAACGCT 6480 ACGGCCTGGC CGCCGTGTAC TGGCTGGTGA TGCGCGGCGC CCAGCCGCGCGCCTGCCGCA 6540 TCCTCAAGGT GCTGGCGCAG GCGNTGNAGA ACAGCGAGAT GAAGGCCCGTGCGCTGGTGG 6600 TGGAGGCCAA CCTGCTGGTG CTGAACGCCC CGCAGCTGGG GGCGGACGAGCAGGACAGGG 6660 CCCTGCTGGC GCTGGTCGAG CGCTTCGGCA TCGTCAACAT CAACCGCTCGGTATTCGACG 6720 AGGCGCCCGG CTTCGCCGAG GCGGTGTTCG GCCTGCTGCG CTCGGGCCGGCTGCAGGCGC 6780 CGGAGGCCTA TCGCGAGGCC TATGCCGACT TCCTCCAGGG CACAGGCCAGGCGCCGCCGG 6840 CGCTCCTGTC CGAGTCGCTG AAACAGCTTA CCGACAAGGA GGCGGCGATCTTCGCCTGCC 6900 TGCTCAGGGG GCTGTCCAAC AGCGAGATCA GCGCCAGCAC CGGCATCGCCCTGTCCACCA 6960 CCAAGTGGCA CCTGAAGAAC ATCTACTCGA AGCTGAGCCT CTCCGGGCGTACCGAAGCCA 7020 TCCTCGCCAT GCAGGCCCGC AACGGATAAT GCGCCATGCC CCTCCCCGGGGAGGGGGGAG 7080 GGGCGCGCGC AACTGCTTAA TCTCCCGCCT GCCGGAAAAG CCGGCAAGCAACCCCATTAG 7140 TACAAGAAGA AATCGGGAGA TATCGCCATG TCTGTTTGGG TCACGTGGCCGGGCTTGGTC 7200 AAGTTCGGCA CCCTGGGCAT CTATGCCGGC CTGATCACGC TCGCGCTTGAGCGCGACGTG 7260 CTGTTCAAGA ACAACCTGTT CGACGTCGAC AACCTGCCCG CGGCCAACGCCAGCATCACC 7320 TGTGATGCCC GCAGCCAGGT GGCGCGTACC GAGGACGGCA CCTGTAACATCCTCGCCAAC 7380 CCGGCCGAGG GCTCGGTGTA CCGCCGCTTC GGGCGCAACG TCGACCCCAGCGTGACCCAT 7440 GGCGAGACCG AGGCCGACAC CCTGCTCAGT CCCAATCCGC GGGAGGTGAGTAACGTGCTG 7500 ATGGCGCGTG GCGAGTTCAA GCCGGCGCCC AGCCTCAACT TCATCGCCGCCTCCTGGATC 7560 CAGTTCATGG TGCATGACTG GGTCGAACAC GGCCCCAACG CCGAAGCCAACCCGATCCAG 7620 GTGCCGCTGC CGGCTGGCGA CGCGCTCGGC TCCGGCAGCC TGTCCGTGCGCCGCACCCAG 7680 CCCGACCCGA CCCGTACCCC GGCCGAGGCC GGCAAGCCGG CCACCTACCGCAACCACAAC 7740 ACCCACTGGT GGGATGGCTC GCAGTTGTAT GGCAGCAGCA AGGACATCAACGACAAGGTG 7800 CGCGCCTTCG AGGGTGGCAA GCTGAAGATC AATCCCGACG GTACCCTGCCGACCGAGTTC 7860 CTCAGCGGCA AGCCGATCAC CGGCTTCAAC GAGAACTGGT GGGTTGGCCTGAGCATGCTG 7920 CACCAGCTGT TCACTAAGGA GCACAACGCC ATCGCGGCGA TGCTCCAGCAGAAGTACCCG 7980 GACAAGGACG ACCAGTGGCT GTACGACCAT GCGCGCCTGG TCAACTCCGCGCTGATGGCC 8040 AAGATCCACA CCGTGGAATG GACCCCGGCG GTGATCGCCA ACCCGGTCACCGAACGCGCC 8100 ATGTATGCCA ACTGGTGGGG CCTGCTGGGT TCCGGTCCGG AGCGTGACAAGTACCAGGAA 8160 GAGGCGCGCA TGCTGCAGGA GGACCTGGCC AGCTCCAACT CCTTCGTCCTGCGCATTCTC 8220 GGCATCGACG GCAGCCAGGC CGGCAGTTCG GCCATCGACC ATGCCCTGGCCGGCATCGTC 8280 GGCTCGACCA ACCCGAACAA CTACGGCGTG CCCTACACCC TGACCGAGGAGTTCGTCGCG 8340 GTCTACCGCA TGCACCCGCT GATGCGCGAC AAGGTCGATG TCTACGACATCGGCTCGAAC 8400 ATCATCGCGC GCAGCGTGCC GCTGCAGGAG ACCCGCGATG CCGACGCCGAGGAGCTGCTG 8460 GCGGACGAGA ATCCCGAGCG CCTGTGGTAC TCCTTCGGCA TCACCAACCCGGGCTCGCTG 8520 ACCCTCAACA ACTACCCGAA CTTCCTGCGC AACCTGTCCA TGCCGCTGGTCGGCAACATC 8580 GACCTGGCGA CCATCGACGT GCTGTGTGAC CGCGAGCGCG GGGTGCCGCGCTACAACGAG 8640 TTCCGCCGCG AGATCGGCCT CAACCCGATC ACCAAGTTGG AGGACCTGACCACCGACCCG 8700 GCCACCCTGG CCAACCTCAA GCGCATCTAC GGCAACGACA TCGAGAAGATTGACACCCTG 8760 GTCGGCATGC TGGCCGAGAC CGTGCGTCCG GACGGCTTCG CCTTCGGCGAGACGGCCTTC 8820 CAGATCTTCA TCATGAACGC CTCGCGGCGC CTGATGACCG ACCGCTTCTATACCAAGGAC 8880 TACCGCCCGG AGATCTACAC CGCCGAGGGC CTGGCCTGGG TCGAGAACACCACCATGGTC 8940 GACGTGCTCA AACGCCACAA TCCGCAGCTG GTCAACAGCC TGGTTGGCGTGGAAAACGCC 9000 TTCAAACCCT GGGGCCTGAA CATCCCGGCC GACTACGAGA GCTGGCCGGGCAAGGCCAAG 9060 CAGGACAACC TGTGGGTCAA CGGCGCCNTG CGCACCCAGT ACGCCGCAGGCCAGCTGCCG 9120 GCCATTCCGC CGGTGGACGT CGGCGGCCTG ATCAGTTCGG TGCTGTGGAAGAAGGTGCAG 9180 ACCAANTCCG ACGTGGCGCC GGCCGGCTAC GAGAAGGCCA TGCACCCGCATGGCGTGATG 9240 GCCAAGGTCA AGTTCACCGC CGTGCCGGGG CACCCCTACA CCGGCCTGTTCCAGGGTGCC 9300 GACAGCGGCC TGCTGCGCCT GTCGGTGGCC GGCGACCCGG CAACCAACGGCTTCCAGCCG 9360 GGTCTGGCGT GGAAGGCCTT CGTCGACGGC AAGCCGTCGC AGAACGTCTCCGCGCTCTAC 9420 ACCCTGAGCG GGCAGGGCAG CAACCACAAC TTCTTCGCCA ACGAGCTGTCGCAGTTCGTC 9480 CTGCCGGAGA CCAACGATAC CCTGGGCACC ACGCTGCTGT TCTCGCTGGTCAGCCTCAAG 9540 CCGACCTTGC TGCGCGTGGA CGACATGGCC GAAGTGACCC AGACCGGCCAGGCCGTGACT 9600 TCGGTCAAGG CGCCGACGCA GATCTACTTC GTGCCCAAGC CGGAGCTGCGCAGCCTGTTC 9660 TCCAGTGCGG CGCATGACTT CCGCAGCGAC CTGACGAGCC TCACCGCCGGCACCAAGCTG 9720 TACGACGTCT ACGCTACCTC GATGGAGATC AAGACCTCGA TCCTGCCGTCGACCAATCGT 9780 AGCTACGCCC AGCAACGGCG CAACAGCGCG GTGAAGATCG GCGAGATGGAGCTGACCTCG 9840 CCGTTCATCG CCTCGGCCTT CGGCGACAAC GGGGTGTTCT TCAAGCACCAGCGTCACGAA 9900 GACAAATAAG GGTCATCCCT TGCTGAACAG CCCCGGCCCG TGCCGGGGCTTTTTTGTGCA 9960 CGCCTTACGT CCATCACACT TCTGCGCCAG GCTGTGCTGC CGCCTGCAAAATCGGCACTG 10020 CAGTTTTTGC GCAAATCCGT TAACTTGGCG CCTCGGCCAT GCCATAAAAACAACAAGAAC 10080 AACAGCAAGA TGGATCTTCT GTTCGGGGAA CGCATCCGCC CATGTCCACCGATACCCACG 10140 CCGCCCTGAC GGCTCCCGCA AGCCCCGCCT TGCGCCCGCT GCCCTTCGCCTTCGCCAAAC 10200 GCCACGGCGT GCTGCTGCGC GAGCCCTTCG GCCAGGTCCA GCTGCAGGTGCGCCGCGGTG 10260 CCAGCCTGGC CGCCGTGCAG GAGGCCCAGC GCTTCGCCGG CCGCGTGCTGCCGCTGCACT 10320 GGCTGGAGCC CGAGGCCTTC GAGCAGGAGC TGGCCCTGGC CTACCAGCGCGACTCCTCCG 10380 AGGTGCGGCA GATGGCCGAG GGCATGGGTG CCGAACTTGA CCTAGCCAGCCTGGCCGAAC 10440 TCACTCCCGA ATCCGGCGAC CTGCTGGAGC AGGAAGATGA CGCGCCGATCATCCGCCTGA 10500 TCAACGCCAT CCTCAGCGAG GCGATCAAGG CCGGCGCCTC CGACATCCACCTGGAAACCT 10560 TCGAGAAACG CCTGGTGGTG CGCTTTCGCG TCGACGGCAT CCTCCGCGAAGTGATCGAAC 10620 CGCGCCGCGA GCTGGCGGCG CTGCTGGTCT CGCGGGTCAA GGTCATGGCGCGCCTGGACA 10680 TCGCCGAGAA GCGCGTACCG CAGGACGGCC GTATTTCGCT CAAGGTCGGCGGTCGCGAGG 10740 TGGATATCCG CGTCTCCACC CTGCCGTCGG CCAACGGCGA GCGGGTGGTGCTGCGTCTGC 10800 TCGACAAGCA GGCCGGGCGC CTGTCGCTCA CGCATCTGGG CATGAGCGAGCGCGACCGCC 10860 GCCTGCTCGA CGACAACCTG CGCAAGCCGC ACGGCATCAT CCTAGTCACCGGCCCCACCG 10920 GCTCGGGCAA GACCACCACC CTGTACGCCG GCCTGGTCAC CCTCAACGACCGCTCGCGCA 10980 ATATCCTCAC GGTGGAAGAC CCGATCGAGT ACTACCTGGA AGGCATCGGCCAGACCCAGG 11040 TCAACCCGCG GGTGGACATG ACCTTCGCCC GCGGCCTGCG CGCCATCCTGCGCCAGGACC 11100 CGGACGTGGT GATGGTCGGC GAGATCCGCG ACCAGGAGAC CGCCGACATCGCCGTGCAGG 11160 CCTCGCTCAC CGGCCACCTG GTGCTCTCCA CCCTGCACAC CAACAGCGCCGTCGGCGCCG 11220 TCACCCGCCT GGTCGACATG GGCGTCGAGC CCTTCCTGCT GTCGTCGTCCCTGCTCGGCG 11280 TGCTGGCCCA GCGCCTGGTG CGCGTGCTCT GCGTGCACTG CCGCGAGGCGCGCCCGGCTG 11340 ACGCGGCCGA GTGCGGCCTG CTCGGCCTCG ACCCGCACAG CCAGCCCCTGATCTACCACG 11400 CCAAGGGCTG CCCGGAGTGC CACCAGCAGG GCTACCGCGG CCGTACTGGCATCTACGAGC 11460 TGGTGATCTT CGACGACCAG ATGCGCACCC TGGTGCACAA CGGCGCCGGTGAGCAGGAGC 11520 TGATTCGCCA CGCCCGCAGC CTCGGCCCGA GCATCCGCGA CGATGGCCGGCGCAAGGTGC 11580 TGGAAGGGGT GACCAGCCTG GAAGAAGTGT TGCGCGTGAC CCGGGAAGACTGATGGCCGC 11640 CTTCGAATAC ATCGCCCTGG ATGCCAGGGG CCGCCAGCAG AAGGGCGTGCTGGAGGGCGA 11700 CAGCGCCCGC CAGGTGCGCC AGCTGCTGCG CGACAAACAG TTGTCGCCGCTGCAGGTCGA 11760 GCCGGTACAG CGCAGGGAGC AGGCCGAGGC TGGTGGCTTC AGCCTGCGCCGTGGCCTGTC 11820 GGCGCGCGAC CTGGCGCTGG TCACCCGTCA GCTGGCGACC CTGATCGGCGCCGCGCTGCC 11880 CATCGAGGAA GCGCTGCGCG CCGCCGCCGC GCAGTCGCGC CAGCCGCGCATCCAGTCGAT 11940 GCTGTTGGCG GTGCGCGCCA AGGTGCTCGA GGGCCACAGC CTGGCCAAGGCCCTGGCCTC 12000 CTACCCGGCG GCCTTCCCCG AGCTGTACCG CGCCACGGTG GCGGCCGGCGAGCATGCGGG 12060 GCACCTGGCG CCGGTGCTGG AGCAGCTGGC CGACTACACC GAGCAGCGCCAGCAGTCGCG 12120 GCAGAAGATC CAGATGGCGC TGCTCTACCC GGTGATCCTG ATGCTCGCTTCGCTGGGCAT 12180 CGTCGGTTTT CTGCTCGGCT ACGTGGTGCC GGATGTGGTG CGGGTGTTCGTCGACTCCGG 12240 GCAGACCCTG CCGGCGCTGA CCCGCGGGCT GATTTTCCTC AGCGAGCTGGTCAAGTCCTG 12300 GGGCGCCCTG GCCATCGTCC TGGCGGTGCT CGGCGTGCTC GCCTTTCGCCGCGCCTTGCG 12360 CAGCGAGGAT CTGCGCCGGC GCTGGCATGC CTTCCTGCTG CGCGTGCCGCTGGTCGGTGG 12420 GCTGATCGCC GCCACCGAGA CGGCACGCTT CGCCTCGACC CTGGCCATCCTGGTGCGCAG 12480 CGGCGTGCCA CTGGTGGAGG CGCTGGCCAT CGGCGCCGAG GTGGTGTCCAACCTGATCAT 12540 CCGCAGCGAC GTGGCCAACG CCACCCAGCG CGTGCGCGAG GGCGGCAGCCTGTCGCGCGC 12600 GCTGGAAGCC AGCCGGCAGT TTCCGCCGAT GATGCTGCAC ATGATCGCCAGCGGCGAGCG 12660 TTCCGGCGAG CTGGACCAGA TGCTGGCGCG CACGGCGCGC AACCAGGAAAACGACCTGGC 12720 GGCCACCATC GGCCTGCTGG TGGGGCTGTT CGAGCCGTTC ATGCTGGTATTCATGGGCGC 12780 GGTGGTGCTG GTGATCGTGC TGGCCATCCT GCTGCCGATT CTTTCTCTGAACCAACTGGT 12840 GGGTTGATAG CGATGTACAA ACAGAAAGGC TTCACGCTGA TCGAAATCATGGTGGTGGTG 12900 GTCATCCTCG GCATTCTCGC TGCCCTGGTG GTGCCGCAGG TGATGGGCCGCCCGGACCAG 12960 GCCAAGGTCA CCGCGGCGCA GAACGACATC CGCGCCATCG GCGCCGCGCTGGACATGTAC 13020 AAGCTGGACA ACCAGAACTA CCCGAGCACC CAGCAGGGCC TGGAGGCCCTGGTGAAGAAA 13080 CCCACCGGCA CGCCGGCGGC GAAGAACTGG AACGCCGAGG GCTACCTGAAGAAGCTGCCG 13140 GTCGACCCCT GGGGCAACCA GTACCTGTAC CTGTCGCCGG GCACCCGCGGCAAGATCGAC 13200 CTGTATTCGC TGGGCGCCGA CGGCCAGGAA GGCGGCGAGG GGACCGACGCCGACATCGGC 13260 AACTGGGATC TCTGACTCGC AATGCAGCGG GGGCGCGGTT TCACTCTGATCGAGCTGCTG 13320 GTGGTGCTGG TGCTGCTGGG CGTGCTCACC GGCCTCGCCG TGCTCGGCAGCGGGATCGCC 13380 AGCAGCCCCG CGCGCAAGCT GGCGGACGAG GCCGAGCGCC TGCAGTCGCTGCTGCGGGTG 13440 CTGCTCGACG AGGCGGTGCT GGACAACCGC GAGTATGGCG TACGCTTCGACGCCCGGAGC 13500 TACCGGGTGC TGCGCTTCGA GCCGCGCACG GCGCGCTGGG AGCCGCTCGACGAGCGCGTG 13560 CACGAGCTGC CGGAGTGGCT CGAGCTGGAG ATCGAGGTCG ACGAGCAGAGTGTCGGGCTG 13620 CCCGCCGCCC GTGGCGAGCA GGACAAAGCC GCGGCCAAGG CGCCACAGCTGCTGCTGCTC 13680 TCCAGTGGCG AGCTGACCCC CTTCGCCCTG CGCCTGTCCG CCGGCCGCGAGCGCGGCGCG 13740 CCGGTGCTGA CGCTGGCCAG CGACGGCTTC GCCGAGCCCG AGCTGCAGCAGGAAAAGTCC 13800 CGATGAAGCG CGGCCGCGGC TTCACCCTGC TCGAGGTGCT GGTGGCCCTGGCGATCTTCG 13860 CCGTGGTCGC CGCCAGCGTG CTCAGCGCCA GCGCTCGCTC GCTGAAGACCGCCGCGCGCC 13920 TGGAGGACAA GACCTTCGCC ACCTGGCTGG CGGACAACCG CCTGCAGGAGCTGCAGCTGG 13980 CCGACGTGCC GCCGGGCGAG GGCCGCGAGC AGGGCGAGGA GAGCTACGCCGGGCGGCGCT 14040 GGCTGTGGCA GAGCGAGGTG CAGGCCACCA GCGAGCCGGA GATGCTGCGTGTCACCGTAC 14100 GGGTGGCGCT GCGGCCGGAG CGCGGGCTGC AGGGCAAGAT CGAAGACCATGCCCTGGTGA 14160 CCCTGAGTGG CTTCGTCGGG GTCGAGCCAT GAGGCAGCGC GGCTTCACCCTGCTGGAAGT 14220 GCTGATCGCC ATCGCCATCT TCGCCCTGCT GGCCATGGCC ACCTACCGCATGCTCGACAG 14280 CGTGCTGCAG ACCGATCGTG GCCAGCGCCA GCAGGAGCAG CGTCTGCGCGAGCTGACGCG 14340 GGCCATGGCA GCTTTCGAAC GCGACCTGCT GCAGGTGCGC CTGCGTCCGGTGCGCGACCC 14400 GCTGGGCGAC CTGCTGCCAG CCCTGCGCGG CAGCAGTGGC CGCGACACCCAGCTGGAGTT 14460 CACCCGCAGC GGCTGGCGCA ACCCGCTCGG CCAGCCGCGC GCCACCCTACAGCGGGTGCG 14520 CTGGCAGCTC GAAGGCGAGC GCTGGCAGCG CGCTTACTGG ACGGTGCTGGACCAGGCCCA 14580 GGACAGCCAG CCGCGGGTGC AGCAGGCGCT GGATGGCGTG CGCCGCTTCGACTTGCGCTT 14640 TCTCGACCAG GAGGGGCGCT GGCTGCAGGA CTGGCCGCCG GCCAACAGTGCTGCCGACGA 14700 GGCCCTGACC CAGCTGCCGC GTGCCGTCGA GCTGGTCGTC GAGCACCGCCATTACGGTGA 14760 ACTGCGCCGT CTCTGGCGCT TGCCCGAGAT GCCGCAGCAG GAACAGATCACGCCGCCCGG 14820 GGGCGAGCAG GGCGGTGAGC TGCTGCCGGA AGAGCCGGAG CCCGAGGCATGAGCCGGCAG 14880 CGCGGCGTGG CACTGATCAC CGTGCTGCTG GTGGTGGCGC TGGTGACCGTGGTCTGCGCG 14940 GCCCTGCTGC TGCGCCAGCA GCTGGCCATC CGCAGCACCG GCAACCAGCTGCTGGTGCGC 15000 CAGGCCCAGT ACTACGCCGA AGGCGGCGAG CTGCTGGCCA AGGCCCTGCTGCGTCGCGAC 15060 CTGGCCGCCG ACCAGGTCGA TCATCCCGGC GAGCCCTGGG CCAACCCCGGCCTGCGCTTC 15120 CCCCTGGATG AGGGCGGCGA GCTGCGCCTG CGCATCGAGG ACCTGGCCGGACGTTTCAAC 15180 CTCAACAGCC TGGCCGCCGG TGGTGAGGCC GGTGAGTTGG CGCTGCTGCGCCTGCGGCGC 15240 CTGCTGCAGC TGCTGCAGCT GACCCCGGCC TATGCCGAGC GCCTGCAGGACTGGCTCGAC 15300 GGCGATCAGG AGGCCAGCGG CATGGCCGGC GCCGAGGATG ACCAGTACCTGCTGCAGAAA 15360 CCGCCCTACC GTACCGGCCC CGGGCGCATT GCCGAGGTGT CGGAGCTGCGCCTGCTGCTG 15420 GGCATGAGCG AGGCCGACTA CCGCCGCCTG GCCCCCTTCG TCAGCGCCCTGCCGAGCCAG 15480 GTCGAGCTGA ACATCAACAC CGCCAGCGCC CTGGTGCTGG CTTGCCTGGGCGAGGGCATN 15540 CCCGAGGCGG TGCTCGAGGC CGCCATCGAN GGTCGCGGCC GCAGCGGCTATCGCGAGCCC 15600 GCTGCCTTCG TCCAGCANCT TGCCAGCTAC GGCGTCAGCC CGCAGGGGCTGGGCATCGCC 15660 AGCCAGTATT TCCGTGTCAC CACCGAGGTG CTGCTGGGTG AGCGGCGCCAGGTGCTGGCC 15720 AGTTATCTGC AACGTGGTAA TGATGGGCGC GTCCGCCTGA TGGCGCGCGATCTGGGGCAG 15780 GAGGGCCTGG CGCCCCCACC CGTCGAGGAG TCCGAGAAAT GAGTCTGCTCACCCTGTTTC 15840 TGCCGCCCCA GGCCTGCACC GAGGCGAGCG CCGACATGCC GGTGTGGTGCGTCGAGAGCG 15900 ACAGCTGCCG TCAGCTGCCC TTCGCCGAGG CCTTGCCGGC CGACGCGCGGGTCTGGCGCT 15960 TGGTGCTGCC GGTGGAGGCG GTGACCACCT GTGTCGTGCA GTTGCCGACCACCAAGGCAC 16020 GCTGGCTGGC CAAGGCCCTG CCGTTCGCCG TCGAGGAGCT GCTGGCCGAGGAGGTGGAGC 16080 AGTTTCACCT GTGCGTCGGT AGCGCGCTGG TCGATGGTCG TCATCGTGTTCATGCCCTGC 16140 GCCGCGAGTG GCTGGCCGGC TGGCTGGCGC TGTGCGGCGA GCGGCCGCCGCAGTGGATCG 16200 AGGTGGACGC CGACCTGTTG CCGGAGGAGG GTAGCCAGCT GCTCTGCCTGGGCGAGCGCT 16260 GGTTGCTCGG CGGGTCGGGC GAGGCGCGCC TGGCCCTGCG TGGCGAGGACTGGCCGCAGC 16320 TGGCGGCGCT CTGTCCGCCG CCCCGGCAAG CCTATGTGCC GCCCGGGCAGGCGGCGCCGC 16380 CGGGCGTCGA GGCCTGCCAG ACGCTGGAGC AGCCGTGGCT CTGGCTGGCCGCGCAGAAGT 16440 CCGGCTGCAA CCTGGCCCAG GGGCCTTTCG CCCGTCGCGA GCCTTCCGGCCAGTGGCAGC 16500 GCTGGCGGCC GCTGGCGGGG CTGCTCGGTC TCTGGCTGGT GCTGCAKTGGGGCTTCAACC 16560 TTGCCCANGG CTGGCAGCTG CAGCGCGAGG GTGAACGCTA TGCCGTGGCCAACGAGGCGC 16620 TGTATCGCGA GCTGTTCCCC GAGGATCGCA AGGTGATCAA CCTGCGTGCGCAGTTCGACC 16680 AGCACCTGGC CGAGGCGGCT GGGAGCGGCC AGAGCCAGTT GCTGGCCCTGCTCGATCAGG 16740 CCGCCGCGGC CATCGGCGAA GGGGGGGCGC AGGTGCAGGT GGATCAGCTCGACTTCAACG 16800 CCCAGCGTGG CGACCTGGCC TTCAACCTGC GTGCCAGCGA CTTCGCCGCGCTGGAAAGCC 16860 TGCGGGCGCG CCTGCAGGAG GCCGGCCTGG CGGTGGACAT GGGCTCGGCGAGCCGCGAGG 16920 ACAACGGCGT CAGTGCGCGC CTGGTGATCG GGGGTAACGG ATGAACGGCCTGCTCATGCA 16980 ATGGCAAGCG CGCCTGGCGC AGAACCCTTT GATGCTGCGC TGGCAGGGCCTGCCGCCACG 17040 CGACCGGCTG GCCCTGGGCC TGCTCGCTGC CTTCCTGTTG CTGGTGCTGCTGTACCTGTT 17100 GCTGTGGCGG CCGGTCAGCC AGAACCTGGA GCGGGCGCGC GGCTTCCTGCAGCAGCAGCG 17160 TACGCTGCAC GCCTACCTGC AGGAGCATGC ACCGCAGGTG CGGGCACGGCAGGTCGCACC 17220 GCAGGCCAGT ATCGAGCCTG CCGCGCTGCA GGGGTTGGTG ACCGCCAGTGCCGCCAGCCA 17280 GGGGCTGAAT GTCGAGCGTC TGGACAACCA GGGTGATGGT GGCCTGCAGGTGAGCCTGCA 17340 GCCGGTCGAG TTCGCCCGTC TGCTGCAGTG GCTGGTGAGC CTGCAGGAGCAGGGCGTGCG 17400 CGTCGAAGAG GCCGGTCTGG AACGTGCCGA CAAGGGGCTG GTGAGCAGCCGCCTGCTGCT 17460 GCGTGCCGGT TGAGCCCGGC TGCACCAGGC GAGTGCGTCG GCACTCGCGCGGAGCATCTG 17520 GAAAACCCGT CCGCGAAGAA AAATTCAAGC AGGGTGTTGA CTTAGCTATGACCTCTNCGT 17580 CAATTGCGCG CCTCGCANGC TAACGGCTGG AT 17612 2634 basepairs nucleic acid single linear unknown 30 ATGGAAGATC GCAAGCCGCCTGCCGCGGCT CCCGTGGGGT TTGCGCGCGC GGAGCTGCTG 60 GAGCTGCTCT GCCGCTGCGAGCAGTTTCCC CTGACCCTGC TGCTGGCGCC CGCCGGTTCC 120 GGCAAGTCGA CCCTGCTGGCCCAGTGGCAG GCCAGCCGGC CCTTCGGCAG TGTGGTGCAC 180 TATCCACTGC AGGCGCGTGACAACGAGCCG GTACGCTTCT TCCGCCACCT GGCCGAAAGC 240 ATCCGCGCCC AGGTCGAGGACTTCGACCTG TCCTGGTTCA ACCCCTTCGC CGCCGAGATG 300 CACCAGGCGC CCGAGGTGCTCGGCGAGTAC CTGGCCGACG CCCTCAATCG CATCGAGAGC 360 CGCCTCTACC TCGTCCTCGACGACTTCCAG TGCATCGGCC AGCCGATCAT CCTCGACGTG 420 CTCTCGGCCA TGCTCGAACGCCTGGCGGGC AACACCCGGG TCATTCTGTC CGGGCGCAAC 480 CATCCGGGGT TCTCCCTCAGCCGCCTGAAA CTGGACAACA AGCTGCTGTG CATCGACCAG 540 CACGACATGC GCCTGTCGCCAGTGCAGATC CAACACCTCA ATGCCTACCT GGGCGGTCCC 600 GAGCTCAGCC CGGCCTATGTCGGCAGCCTG ATGGCCATGA CCGAGGGCTG GATGGTCGGG 660 GTGAAGATGG CCCTGATGGCCCATGCGCGC TTCGGCACCG AGGCCCTGCA GCGCTTCGGT 720 GGCGGCCATC CGGAGATAGTCGACTACTTC GGCCATGTGG TGCTGAAGAA GCTGTCGCCG 780 CAGCTGCACG ACTTCCTGTTGTGCAGCGCG ATCTTCGAGC GCTTCGACGG CGAGCTATGC 840 GACCGGGTGC TGGATCGCAGCGGTTCGGCC CTGCTGCTGG AGGACCTGGC CGCGCGCGAG 900 CTGTTCATGC TGCCGGTGGACGAGTATCCC GGCTGCTACC GCTACCACGC CCTGTTGCAC 960 GATTTCCTCG CCCGGCGCCTGGCCGTGCAC AAGCCACAGG AAGTGGCGCA ACTGCACCGG 1020 CGGGCGGCCC TGGCGCTGCAGCAGCGTGGC GACCTGGAGC TGGCCCTGCA GCATGCCCAG 1080 CGCAGTGGCG ACCGCGCGTTGTTCCAAAGC ATGCTGGGCG AGGCCTGCGA GCAATGGGTG 1140 CGCAGCGGTC ACTTCGCCGAGGTGCTGAAG TGGCTGGAGC CGCTGAGCGA GGCGGAACTC 1200 TGCGNGCAGT CGCGCCTGCTGGTGCTGATG ACCTATGCCC TGACCCTGTC GCGGCGTTTC 1260 CACCAGGCGC GCTACTGCTTGGACGAACTG GTGGCGCGCT GCACCGGTCA GCCGGGCCTG 1320 GAGGAGCCGA CCCGCCAGCTGCTGGCGCTC AACCTGGAGC TGTTCCAGCA CGACCTGGCC 1380 TTCGACCCCG GCCAGCGCTGGTCCGACCTG CTGGCCGCGG GCGTCGCCTC GGACATCCGT 1440 GCCCTGGCGC TGAGCATCCTCGCCTATCAC CACCTGATGC ACGGCCGCCT GGAGCAGTCG 1500 ATCCAGCTGG CGCTGGAGGCCAAGGCGCTG CTGGCCAGCA CCGGCCAGCT GTTCCTGGAG 1560 AGCTACGCCG ACCTGATCATCGCCCTGTGC AACCGCAACG CCGGGCGCGC CACCAGCGCG 1620 CGCAAGGACG TCTGCCTGGATTACCAGCGC ACCGAGCGCT CCTCGCCGGC CTGGGTCAAC 1680 CGTGCCACCG CCATGGTGGTGGCGCTGTAC GAGCAGAACC AGCTGGCCGC CGCCCAGCAG 1740 CTGTGCGAGG ACCTGATGGCCATGGTCACG TCGTCCTCGG CCACCGAGAC CATCGCCACC 1800 GTGCACATCA CCCTGTCGCGCCTGCTCCAC CGGCGCCAGT CCCAGGGCCG CGCCACGCGC 1860 CTGCTGGAGC AGCTGTCGCGCATCCTGCAA CTGGGCAACT ACGCCCGCTT CGCCAGCCAG 1920 GCGGCGCAGG AGAGCATGCGCCAGGCCTAT CTCGACGGGC GCCCGGCGGC GCTCGACGCA 1980 CTGGCCCAAC GCCTGGGTATCGAGGAGCGC CTGGCCGCCG GGGAGTGGGA GAGGGTGCGG 2040 CCCTATGAAG AGTGCTGGGAACGCTACGGC CTGGCCGCCG TGTACTGGCT GGTGATGCGC 2100 GGCGCCCAGC CGCGCGCCTGCCGCATCCTC AAGGTGCTGG CGCAGGCGNT GNAGAACAGC 2160 GAGATGAAGG CCCGTGCGCTGGTGGTGGAG GCCAACCTGC TGGTGCTGAA CGCCCCGCAG 2220 CTGGGGGCGG ACGAGCAGGACAGGGCCCTG CTGGCGCTGG TCGAGCGCTT CGGCATCGTC 2280 AACATCAACC GCTCGGTATTCGACGAGGCG CCCGGCTTCG CCGAGGCGGT GTTCGGCCTG 2340 CTGCGCTCGG GCCGGCTGCAGGCGCCGGAG GCCTATCGCG AGGCCTATGC CGACTTCCTC 2400 CAGGGCACAG GCCAGGCGCCGCCGGCGCTC CTGTCCGAGT CGCTGAAACA GCTTACCGAC 2460 AAGGAGGCGG CGATCTTCGCCTGCCTGCTC AGGGGGCTGT CCAACAGCGA GATCAGCGCC 2520 AGCACCGGCA TCGCCCTGTCCACCACCAAG TGGCACCTGA AGAACATCTA CTCGAAGCTG 2580 AGCCTCTCCG GGCGTACCGAAGCCATCCTC GCCATGCAGG CCCGCAACGG ATAA 2634 877 amino acids amino acidsingle linear unknown 31 Met Glu Asp Arg Lys Pro Pro Ala Ala Ala Pro ValGly Phe Ala Arg 1 5 10 15 Ala Glu Leu Leu Glu Leu Leu Cys Arg Cys GluGln Phe Pro Leu Thr 20 25 30 Leu Leu Leu Ala Pro Ala Gly Ser Gly Lys SerThr Leu Leu Ala Gln 35 40 45 Trp Gln Ala Ser Arg Pro Phe Gly Ser Val ValHis Tyr Pro Leu Gln 50 55 60 Ala Arg Asp Asn Glu Pro Val Arg Phe Phe ArgHis Leu Ala Glu Ser 65 70 75 80 Ile Arg Ala Gln Val Glu Asp Phe Asp LeuSer Trp Phe Asn Pro Phe 85 90 95 Ala Ala Glu Met His Gln Ala Pro Glu ValLeu Gly Glu Tyr Leu Ala 100 105 110 Asp Ala Leu Asn Arg Ile Glu Ser ArgLeu Tyr Leu Val Leu Asp Asp 115 120 125 Phe Gln Cys Ile Gly Gln Pro IleIle Leu Asp Val Leu Ser Ala Met 130 135 140 Leu Glu Arg Leu Ala Gly AsnThr Arg Val Ile Leu Ser Gly Arg Asn 145 150 155 160 His Pro Gly Phe SerLeu Ser Arg Leu Lys Leu Asp Asn Lys Leu Leu 165 170 175 Cys Ile Asp GlnHis Asp Met Arg Leu Ser Pro Val Gln Ile Gln His 180 185 190 Leu Asn AlaTyr Leu Gly Gly Pro Glu Leu Ser Pro Ala Tyr Val Gly 195 200 205 Ser LeuMet Ala Met Thr Glu Gly Trp Met Val Gly Val Lys Met Ala 210 215 220 LeuMet Ala His Ala Arg Phe Gly Thr Glu Ala Leu Gln Arg Phe Gly 225 230 235240 Gly Gly His Pro Glu Ile Val Asp Tyr Phe Gly His Val Val Leu Lys 245250 255 Lys Leu Ser Pro Gln Leu His Asp Phe Leu Leu Cys Ser Ala Ile Phe260 265 270 Glu Arg Phe Asp Gly Glu Leu Cys Asp Arg Val Leu Asp Arg SerGly 275 280 285 Ser Ala Leu Leu Leu Glu Asp Leu Ala Ala Arg Glu Leu PheMet Leu 290 295 300 Pro Val Asp Glu Tyr Pro Gly Cys Tyr Arg Tyr His AlaLeu Leu His 305 310 315 320 Asp Phe Leu Ala Arg Arg Leu Ala Val His LysPro Gln Glu Val Ala 325 330 335 Gln Leu His Arg Arg Ala Ala Leu Ala LeuGln Gln Arg Gly Asp Leu 340 345 350 Glu Leu Ala Leu Gln His Ala Gln ArgSer Gly Asp Arg Ala Leu Phe 355 360 365 Gln Ser Met Leu Gly Glu Ala CysGlu Gln Trp Val Arg Ser Gly His 370 375 380 Phe Ala Glu Val Leu Lys TrpLeu Glu Pro Leu Ser Glu Ala Glu Leu 385 390 395 400 Cys Xaa Gln Ser ArgLeu Leu Val Leu Met Thr Tyr Ala Leu Thr Leu 405 410 415 Ser Arg Arg PheHis Gln Ala Arg Tyr Cys Leu Asp Glu Leu Val Ala 420 425 430 Arg Cys ThrGly Gln Pro Gly Leu Glu Glu Pro Thr Arg Gln Leu Leu 435 440 445 Ala LeuAsn Leu Glu Leu Phe Gln His Asp Leu Ala Phe Asp Pro Gly 450 455 460 GlnArg Trp Ser Asp Leu Leu Ala Ala Gly Val Ala Ser Asp Ile Arg 465 470 475480 Ala Leu Ala Leu Ser Ile Leu Ala Tyr His His Leu Met His Gly Arg 485490 495 Leu Glu Gln Ser Ile Gln Leu Ala Leu Glu Ala Lys Ala Leu Leu Ala500 505 510 Ser Thr Gly Gln Leu Phe Leu Glu Ser Tyr Ala Asp Leu Ile IleAla 515 520 525 Leu Cys Asn Arg Asn Ala Gly Arg Ala Thr Ser Ala Arg LysAsp Val 530 535 540 Cys Leu Asp Tyr Gln Arg Thr Glu Arg Ser Ser Pro AlaTrp Val Asn 545 550 555 560 Arg Ala Thr Ala Met Val Val Ala Leu Tyr GluGln Asn Gln Leu Ala 565 570 575 Ala Ala Gln Gln Leu Cys Glu Asp Leu MetAla Met Val Thr Ser Ser 580 585 590 Ser Ala Thr Glu Thr Ile Ala Thr ValHis Ile Thr Leu Ser Arg Leu 595 600 605 Leu His Arg Arg Gln Ser Gln GlyArg Ala Thr Arg Leu Leu Glu Gln 610 615 620 Leu Ser Arg Ile Leu Gln LeuGly Asn Tyr Ala Arg Phe Ala Ser Gln 625 630 635 640 Ala Ala Gln Glu SerMet Arg Gln Ala Tyr Leu Asp Gly Arg Pro Ala 645 650 655 Ala Leu Asp AlaLeu Ala Gln Arg Leu Gly Ile Glu Glu Arg Leu Ala 660 665 670 Ala Gly GluTrp Glu Arg Val Arg Pro Tyr Glu Glu Cys Trp Glu Arg 675 680 685 Tyr GlyLeu Ala Ala Val Tyr Trp Leu Val Met Arg Gly Ala Gln Pro 690 695 700 ArgAla Cys Arg Ile Leu Lys Val Leu Ala Gln Ala Xaa Xaa Asn Ser 705 710 715720 Glu Met Lys Ala Arg Ala Leu Val Val Glu Ala Asn Leu Leu Val Leu 725730 735 Asn Ala Pro Gln Leu Gly Ala Asp Glu Gln Asp Arg Ala Leu Leu Ala740 745 750 Leu Val Glu Arg Phe Gly Ile Val Asn Ile Asn Arg Ser Val PheAsp 755 760 765 Glu Ala Pro Gly Phe Ala Glu Ala Val Phe Gly Leu Leu ArgSer Gly 770 775 780 Arg Leu Gln Ala Pro Glu Ala Tyr Arg Glu Ala Tyr AlaAsp Phe Leu 785 790 795 800 Gln Gly Thr Gly Gln Ala Pro Pro Ala Leu LeuSer Glu Ser Leu Lys 805 810 815 Gln Leu Thr Asp Lys Glu Ala Ala Ile PheAla Cys Leu Leu Arg Gly 820 825 830 Leu Ser Asn Ser Glu Ile Ser Ala SerThr Gly Ile Ala Leu Ser Thr 835 840 845 Thr Lys Trp His Leu Lys Asn IleTyr Ser Lys Leu Ser Leu Ser Gly 850 855 860 Arg Thr Glu Ala Ile Leu AlaMet Gln Ala Arg Asn Gly 865 870 875 513 base pairs nucleic acid singlelinear unknown 32 ATGAACGGCC TGCTCATGCA ATGGCAAGCG CGCCTGGCGC AGAACCCTTTGATGCTGCGC 60 TGGCAGGGCC TGCCGCCACG CGACCGGCTG GCCCTGGGCC TGCTCGCTGCCTTCCTGTTG 120 CTGGTGCTGC TGTACCTGTT GCTGTGGCGG CCGGTCAGCC AGAACCTGGAGCGGGCGCGC 180 GGCTTCCTGC AGCAGCAGCG TACGCTGCAC GCCTACCTGC AGGAGCATGCACCGCAGGTG 240 CGGGCACGGC AGGTCGCACC GCAGGCCAGT ATCGAGCCTG CCGCGCTGCAGGGGTTGGTG 300 ACCGCCAGTG CCGCCAGCCA GGGGCTGAAT GTCGAGCGTC TGGACAACCAGGGTGATGGT 360 GGCCTGCAGG TGAGCCTGCA GCCGGTCGAG TTCGCCCGTC TGCTGCAGTGGCTGGTGAGC 420 CTGCAGGAGC AGGGCGTGCG CGTCGAAGAG GCCGGTCTGG AACGTGCCGACAAGGGGCTG 480 GTGAGCAGCC GCCTGCTGCT GCGTGCCGGT TGA 513 170 amino acidsamino acid single linear unknown 33 Met Asn Gly Leu Leu Met Gln Trp GlnAla Arg Leu Ala Gln Asn Pro 1 5 10 15 Leu Met Leu Arg Trp Gln Gly LeuPro Pro Arg Asp Arg Leu Ala Leu 20 25 30 Gly Leu Leu Ala Ala Phe Leu LeuLeu Val Leu Leu Tyr Leu Leu Leu 35 40 45 Trp Arg Pro Val Ser Gln Asn LeuGlu Arg Ala Arg Gly Phe Leu Gln 50 55 60 Gln Gln Arg Thr Leu His Ala TyrLeu Gln Glu His Ala Pro Gln Val 65 70 75 80 Arg Ala Arg Gln Val Ala ProGln Ala Ser Ile Glu Pro Ala Ala Leu 85 90 95 Gln Gly Leu Val Thr Ala SerAla Ala Ser Gln Gly Leu Asn Val Glu 100 105 110 Arg Leu Asp Asn Gln GlyAsp Gly Gly Leu Gln Val Ser Leu Gln Pro 115 120 125 Val Glu Phe Ala ArgLeu Leu Gln Trp Leu Val Ser Leu Gln Glu Gln 130 135 140 Gly Val Arg ValGlu Glu Ala Gly Leu Glu Arg Ala Asp Lys Gly Leu 145 150 155 160 Val SerSer Arg Leu Leu Leu Arg Ala Gly 165 170 1176 base pairs nucleic acidsingle linear unknown 34 GATCTCGAGG GCGTCGGCTT CGACACCCTG GCGGTGCGCGCCGGTCAGCA TCGCACGCCG 60 GAGGGCGAGC ATGGCGAGGC CATGTTCCTC ACCTCCAGCTATGTGTTCCG CAGCGCCGCC 120 GACGCCGCCG CGCGCTTCGC CGGCGAGCAG CCGGGCAACGTCTACTCGCG CTACACCAAC 180 CCGACCGTGC GCGCCTTCGA GGAGCGCATC GCCGCCCTGGAAGGCGCCGA GCAGGCGGTG 240 GCCACCGCCT CCGGCATGGC CGCCATCCTG GCCATCGTCATGAGCCTGTG CAGCGCCGGC 300 GACCATGTGC TGGTGTCGCG CAGCGTGTTC GGCTCGACCATCAGCCTGTT CGAGAAGTAC 360 CTCAAGCGCT TCGGCATCGA GGTGGACTAC CCGCCGCTGGCCGATCTGGA CGCCTGGCAG 420 GCAGCCTTCA AGCCCAACAC CAAGCTGCTG TTCGTCGAATCGCCGTCCAA CCCGTTGGCC 480 GAGCTGGTGG ACATAGGCGC CCTGGCCGAG ATCGCCCACGCCCGCGGCGC CCTGCTGGCG 540 GTGGACAACT GCTTCTGCAC CCCGGCCCTG CAGCAGCCGCTGGCGCTGGG CGCCGATATG 600 GTCATGCATT CGGCGACCAA GTTCATCGAT GGCCAGGGCCGCGGCCTGGG CGGCGTGGTG 660 GCCGGGCGCC GTGCGCAGAT GGAGCAGGTG GTCGGCTTCCTGCGCACCGC CGGGCCGACC 720 CTCAGCCCGT TCAACGCCTG GATGTTCCTC AAGGGCCTGGAGACCCTGCG TATCCGCATG 780 CAGGCGCAGA GCGCCAGCGC CCTGGAACTG GCCCGCTGGTTGGAGACCCA GCCGGGCATC 840 GACAGGGTCT ACTATGCCGG CCTGCCCAGC CACCCGCAGCACGAGCTGGC CAAGCGGCAG 900 CAGAGTGCCT TCGGCGCGGT GCTGAGCTTC GAGGTCAAGGGCGGCAAGGA GGCGGCCTGG 960 CGTTTCATCG ATGCCACCCG GGTGATCTCC ATCACCACCAACCTGGGCGA TACCAAGACC 1020 ACCATCGCCC ATCCGGCGAC CACCTCCCAC GGTCGTCTGTCGCCGCAGGA GCGCGCCAGC 1080 GCCGGTATCC GCGACAACCT GGTGCGTGTC GCCGTGGGCCTGGAAGACGT GGTCGACCTC 1140 AAGGCCGACC TGGCCCGTGG CCTGGCCGCG CTCTGA 1176392 amino acids amino acid single linear unknown 35 Tyr Asp Leu Glu GlyVal Gly Phe Asp Thr Leu Ala Val Arg Ala Gly 1 5 10 15 Gln His Arg ThrPro Glu Gly Glu His Gly Glu Ala Met Phe Leu Thr 20 25 30 Ser Ser Tyr ValPhe Arg Ser Ala Ala Asp Ala Ala Ala Arg Phe Ala 35 40 45 Gly Glu Gln ProGly Asn Val Tyr Ser Arg Tyr Thr Asn Pro Thr Val 50 55 60 Arg Ala Phe GluGlu Arg Ile Ala Ala Leu Glu Gly Ala Glu Gln Ala 65 70 75 80 Val Ala ThrAla Ser Gly Met Ala Ala Ile Leu Ala Ile Val Met Ser 85 90 95 Leu Cys SerAla Gly Asp His Val Leu Val Ser Arg Ser Val Phe Gly 100 105 110 Ser ThrIle Ser Leu Phe Glu Lys Tyr Leu Lys Arg Phe Gly Ile Glu 115 120 125 ValAsp Tyr Pro Pro Leu Ala Asp Leu Asp Ala Trp Gln Ala Ala Phe 130 135 140Lys Pro Asn Thr Lys Leu Leu Phe Val Glu Ser Pro Ser Asn Pro Leu 145 150155 160 Ala Glu Leu Val Asp Ile Gly Ala Leu Ala Glu Ile Ala His Ala Arg165 170 175 Gly Ala Leu Leu Ala Val Asp Asn Cys Phe Cys Thr Pro Ala LeuGln 180 185 190 Gln Pro Leu Ala Leu Gly Ala Asp Met Val Met His Ser AlaThr Lys 195 200 205 Phe Ile Asp Gly Gln Gly Arg Gly Leu Gly Gly Val ValAla Gly Arg 210 215 220 Arg Ala Gln Met Glu Gln Val Val Gly Phe Leu ArgThr Ala Gly Pro 225 230 235 240 Thr Leu Ser Pro Phe Asn Ala Trp Met PheLeu Lys Gly Leu Glu Thr 245 250 255 Leu Arg Ile Arg Met Gln Ala Gln SerAla Ser Ala Leu Glu Leu Ala 260 265 270 Arg Trp Leu Glu Thr Gln Pro GlyIle Asp Arg Val Tyr Tyr Ala Gly 275 280 285 Leu Pro Ser His Pro Gln HisGlu Leu Ala Lys Arg Gln Gln Ser Ala 290 295 300 Phe Gly Ala Val Leu SerPhe Glu Val Lys Gly Gly Lys Glu Ala Ala 305 310 315 320 Trp Arg Phe IleAsp Ala Thr Arg Val Ile Ser Ile Thr Thr Asn Leu 325 330 335 Gly Asp ThrLys Thr Thr Ile Ala His Pro Ala Thr Thr Ser His Gly 340 345 350 Arg LeuSer Pro Gln Glu Arg Ala Ser Ala Gly Ile Arg Asp Asn Leu 355 360 365 ValArg Val Ala Val Gly Leu Glu Asp Val Val Asp Leu Lys Ala Asp 370 375 380Leu Ala Arg Gly Leu Ala Ala Leu 385 390 847 base pairs nucleic acidsingle linear unknown 36 ATGCTGAAAA AGCTGTTCAA GTCGTTTCGT TCACCTCTCAAGCGCCAAGC ACGCCCCCGC 60 AGCACGCCGG AAGTTCTCGG CCCGCGCCAG CATTCCCTGCAACGCAGCCA GTTCAGCCGC 120 AATGCGGTAA ACGTGGTGGA GCGCCTGCAG AACGCCGGCTACCAGGCCTA TCTGGTCGGC 180 GGCTGCGTAC GCGACCTGCT GATCGGCGTG CAGCCCAAGGACTTCGACGT GGCCACCAGC 240 GCCACCCCCG AGCAGGTGCG GGCCGAGTTT CGCAACGCCCGGGTGATCGG CCGCCGCTTC 300 AAGCTGGCGC ATGTGCATTT CGGCCGCGAG ATCATCGAGGTGGCGACCTT CCACAGCAAC 360 CACCCGCAGG GCGACGACGA GGAAGACAGC CACCAGTCGGCCCGTAACGA GAGCGGGCGC 420 ATCCTGCGCG ACAACGTCTA CGGCAGTCAG GAGAGCGATGCCCAGCGCCG CGACTTCACC 480 ATCAACGCCC TGTACTTCGA CGTCAGCGGC GAGCGCGTGCTGGACTATGC CCACGGCGTG 540 CACGACATCC GCAACCGCCT GATCCGCCTG ATCGGCGACCCCGAGCAGCG CTACCTGGAA 600 GACCCGGTAC GCATGCTGCG CGCCGTACGC TTCGCCGCCAAGCTGGACTT CGACATCGAG 660 AAACACAGCG CCGCGCCGAT CCGCCGCCTG GCGCCGATGCTGCGCGACAT CCCTGCCGCG 720 CGCCTGTTCG ACGAGGTGCT CAAGCTGTTC CTCGCCGGCTACGCCGAGCG CACCTTCGAA 780 CTGCTGCTCG AGTACGACCT GTTCGCCCCG CTGTTCCCGGCCAGCGCCCG CGCCCTGGAG 840 CGCGATC 847 282 amino acids amino acid singlelinear unknown 37 Met Leu Lys Lys Leu Phe Lys Ser Phe Arg Ser Pro LeuLys Arg Gln 1 5 10 15 Ala Arg Pro Arg Ser Thr Pro Glu Val Leu Gly ProArg Gln His Ser 20 25 30 Leu Gln Arg Ser Gln Phe Ser Arg Asn Ala Val AsnVal Val Glu Arg 35 40 45 Leu Gln Asn Ala Gly Tyr Gln Ala Tyr Leu Val GlyGly Cys Val Arg 50 55 60 Asp Leu Leu Ile Gly Val Gln Pro Lys Asp Phe AspVal Ala Thr Ser 65 70 75 80 Ala Thr Pro Glu Gln Val Arg Ala Glu Phe ArgAsn Ala Arg Val Ile 85 90 95 Gly Arg Arg Phe Lys Leu Ala His Val His PheGly Arg Glu Ile Ile 100 105 110 Glu Val Ala Thr Phe His Ser Asn His ProGln Gly Asp Asp Glu Glu 115 120 125 Asp Ser His Gln Ser Ala Arg Asn GluSer Gly Arg Ile Leu Arg Asp 130 135 140 Asn Val Tyr Gly Ser Gln Glu SerAsp Ala Gln Arg Arg Asp Phe Thr 145 150 155 160 Ile Asn Ala Leu Tyr PheAsp Val Ser Gly Glu Arg Val Leu Asp Tyr 165 170 175 Ala His Gly Val HisAsp Ile Arg Asn Arg Leu Ile Arg Leu Ile Gly 180 185 190 Asp Pro Glu GlnArg Tyr Leu Glu Asp Pro Val Arg Met Leu Arg Ala 195 200 205 Val Arg PheAla Ala Lys Leu Asp Phe Asp Ile Glu Lys His Ser Ala 210 215 220 Ala ProIle Arg Arg Leu Ala Pro Met Leu Arg Asp Ile Pro Ala Ala 225 230 235 240Arg Leu Phe Asp Glu Val Leu Lys Leu Phe Leu Ala Gly Tyr Ala Glu 245 250255 Arg Thr Phe Glu Leu Leu Leu Glu Tyr Asp Leu Phe Ala Pro Leu Phe 260265 270 Pro Ala Ser Ala Arg Ala Leu Glu Arg Asp 275 280

What is claimed:
 1. An isolated nucleic acid encoding a kinase from aPseudomonad that can regulate the expression of a lipase.
 2. Theisolated nucleic acid of claim 1 that hybridizes under stringentconditions to nucleic acid having the sequence as shown in SEQ ID NO: 1.3. The isolated nucleic acid of claim 2 that has the sequence as shownin SEQ ID NO:
 1. 4. An expression vector comprising the isolated nucleicacid of claim
 2. 5. An expression vector comprising the isolated nucleicacid of claim
 3. 6. A host cell comprising the expression vector ofclaim
 4. 7. The host cell of claim 6 that is a Bacterium.
 8. The hostcell of claim 6 that is a Pseudomonad.
 9. The host cell of claim 6 thatfurther comprises nucleic acid encoding a desired protein.
 10. The hostcell of claim 9 where the desired protein is an enzyme.
 11. The hostcell of claim 10 wherein the enzyme includes esterases, hydrolases,lipases, isomerases, mutases, transferases, kinases, and phosphatases.